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Potential  biases  of  incomplete  mixed  models  in  the  estimation  of  variance  component, 
heritability,  and  the  prediction  of  breeding  gains  are  theoretically  formulated  based  on 
balanced  data.  For  a given  incomplete  mixed  model,  the  magnitudes  of  biases  are  functions 
of  population  genetic  architecture,  mating  design,  and  field  experimental  designs,  which  can 
be  precisely  assessed  using  the  derived  formulae.  It  was  found  that  most  incomplete  mixed 
models  over-estimate  additive  genetic  variance,  resulting  in  upward-biased  heritability  and 
inflated  genetic  gains.  The  relative  consequence  of  bias  is  severe  for  traits  under  weak 
additive  genetic  control  with  the  strong  influence  of  non-additive  genetic  effects.  For 
incomplete  mixed  models  ignoring  additive  genetic  effects  (GCA)  x environment  (E) 
interactions,  the  potential  biases  are  linearly  related  to  the  number  of  environments  included 
in  the  data.  For  incomplete  mixed  models  ignoring  dominance  effects,  biases  are  linearly 
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proportional  to  the  number  of  crosses  that  each  parent  is  mated.  For  pure  additive  genetic 
models  ignoring  both  dominance  effects  and  GCA  x E interaction,  the  biases  are  cumulative 
and  can  be  as  high  as  60%  of  the  true  parameter.  For  unbalanced  data,  the  formulae  can  be 
used  to  approximate  the  minimum  biases  for  a given  incomplete  mixed  model  by  substituting 
for  the  average  number  of  design  parameters  of  an  experiment. 

The  search  for  optimal  statistical  methods  in  estimating  type  B genetic  correlations  is 
begun  by  developing  a new  univariate  approach.  The  new  method  estimates  type  B genetic 
correlations  using  predicted  parental  GCA  effects  with  the  technique  of  best  linear  unbiased 
prediction  (BLUP)  in  each  individual  environment.  Numerical  comparisons  using  simulated 
forest  genetic  data  with  various  genetic  architecture  and  data  imbalance  have  demonstrated 
its  unbiasedness,  better  match  to  underlying  true  population  parameters,  and  suitability  to 
various  experimental  designs  and  data  imbalance. 

The  unbiasedness  and  precision  of  multivariate  methods  in  estimating  type  B genetic 
correlations  are  also  investigated  with  a simulation  study.  It  was  concluded  that  constrained 
multivariate  methods  produce  empirically  unbiased  estimates  of  type  B genetic  correlations 
which  have  higher  estimation  precision,  especially  when  heritabilities  of  traits  are  low  in  the 
concerned  environments.  The  practical  importance  of  keeping  estimates  within  parameter 
space  and  other  additional  advantages  makes  the  constrained  multivariate  method  a desirable 
choice. 
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CHAPTER  1 
INTRODUCTION 


For  genetic  evaluation  of  quantitative  traits,  mixed  model  methods  have  not  only 
become  the  method  of  choice  in  animal  breeding  (Henderson  1953,  1973,  1984;  Mrode 
1996),  but  also  gained  increasing  popularity  in  tree  improvement  programs  (White  and 
Hodge  1989;  Huber  1993;  Brralho  and  Wilson  1994;  Jarvis  et  al.  1995;  Dieters  et  al.  1995; 
Ericsson  and  Danell  1995;  Wei  and  Borralho  1998).  One  of  the  important  applications  of 
mixed  linear  model  theory  is  the  technique  of  best  linear  vmbiased  prediction  (BLUP)  of 
breeding  values,  which  has  shown  superior  properties  to  the  traditional  approaches  of  fixed 
genetic  effect  models  for  handling  complex  messy  and  unbalanced  data  structures. 

While  the  theoretical  developments  of  BLUP  have  been  well  established  (Henderson 
1 953 , 1 973, 1 984;  Searle  et  al.  1 992;  Mrode  1 996),  practical  applications  of  BLUP  in  breeding 
programs  were  hindered  by  its  computational  demands  (Searle  et  al.  1 992).  Although  BLUP 
has  now  become  a routine  procedure  in  animal  breeding  due  to  the  development  of  faster 
computers  and  more  efficient  algorithms,  its  applications  in  forest  genetic  evaluation  are 
relatively  new  (White  and  Hodge  1989).  The  lack  of  computer  software  suited  to  the  data 
properties  of  forest  genetic  experiments  often  leads  tree  breeders  to  follow  those  approaches 
adopted  in  animal  breeding,  resulting  in  the  use  of  incomplete  mixed  linear  models  with 
respect  to  forest  genetic  experiments.  Although  such  model  specifications  may  be  adequate 
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for  animal  genetic  data  (Varona  et  al.  1 997),  the  question  remains  regarding  their  adequacies 
in  tree  breeding  value  prediction. 

The  first  part  of  this  dissertation  places  emphasis  on  examining  the  effects  of 
incomplete  mixed  linear  models  on  genetic  parameter  estimation  and  parental  breeding  value 
prediction  in  forest  genetic  data  analyses.  Knowledge  is  needed  about  the  potential  biases 
that  may  result  from  the  use  of  incomplete  mixed  linear  models  because  estimates  of  genetic 
parameters  (such  as  narrow  sense  heritability,  type  B genetic  correlation,  and  the  ratio  of 
dominance  variance  to  additive  variance)  are  maj  or  considerations  in  creating  a tree  breeding 
strategy  (Zobel  and  Talbert  1984;  White  et  al.  1993),  and  predicted  breeding  values  are  the 
basis  for  ranking  candidates,  making  selections,  and  evaluating  genetic  progress  in  a 
breeding  program  (White  and  Hodge  1989;  White  et  al.  1993;  Borralho  and  Dutkowski 
1998). 

The  potential  biases  from  incomplete  mixed  models  in  genetic  parameter  estimation 
and  parental  breeding  value  prediction  are  most  easily  evaluated  with  balanced  data.  Since 
estimates  of  genetic  parameters  and  predicted  breeding  values  are  functions  of  variance 
components  estimated  from  data  samples  (Falconer  1989;  Mrode  1996),  the  essential 
questions  of  biases  can  be  answered  once  the  effects  of  incomplete  mixed  linear  models  on 
the  estimation  of  variance  components  are  determined.  Due  to  the  property  of  orthogonality 
among  experimental  factors  when  data  are  balanced,  closed  forms  of  biases  for  variance 
component  estimates  cem  be  derived  in  terms  of  design  variables  of  experiments  and  true 
variance  components  of  breeding  populations. 

While  the  assessment  of  biases  based  on  balanced  data  provides  useful  information 
on  the  acceptability  of  different  incomplete  mixed  linear  models  in  forest  genetic  data 
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analysis,  it  is  still  unclear  about  the  possible  magnitudes  of  biases  from  incomplete  models 
when  data  are  highly  unbalanced.  It  is  also  of  interest  whether  the  derived  formulas  based 
on  balanced  data  can  be  used  to  approximate  biases  for  unbalanced  data.  These  questions  are 
approached  by  simulation  studies  considering  typical  scenarios  of  forest  genetic  tests  with 
different  mating  designs  (field  experimental  design  is  restricted  to  the  randomized  complete 
block  design  with  single-tree  plots),  data  imbalance,  and  a range  of  genetic  architectures 
reported  for  major  forest  species  (Yeh  and  Heaman  1987;  Adams  et  al.  1994;  Dieters  et  al. 
1995;  Li  et  al.  1996).  The  ‘goodness  of  fit’  for  approximated  biases  based  on  theoretical 
formulae  is  judged  by  the  regression  of  realized  empirical  biases  from  actual  unbalanced  data 
analyses  on  the  theoretically  approximated  biases. 

Defined  as  the  genetic  correlation  for  the  same  trait  measured  in  different 
environments,  type  B genetic  correlation  (Yamada  1962,  Burdon  1977)  is  an  important 
genetic  parameter  in  tree  breeding  programs.  The  utilities  of  type  B genetic  correlation  in 
forest  tree  breeding  include  its  applications  in  the  quantitative  study  of  genotype-by- 
environment interaction  (Burdon  1977;  Johnson  and  Burdon  1990;  Woolaston  et  al.  1991; 
Adams  et  al.  1994;  Dieters  et  al.  1995;  Pswarayi  et  al.  1997)  and  its  applications  in  indirect 
selection  (Jiang  1985;  White  and  Hodge  1989;  Wu  1993;  Johnson  1997).  Practical 
implications  of  type  B genetic  correlation  in  tree  breeding  include  its  influence  on  the 
determination  of  breeding  zones  (Johnson  1997),  the  deployment  of  genetically  improved 
materials,  and  the  estimation  of  genetic  gains  from  indirect  selection  (White  and  Hodge 
1989). 

Statistical  methods  for  estimating  type  B genetic  correlations  have  been  well 
established  for  balanced  data  with  univariate  approaches  (Yamada  1962;  Burdon  1977). 


4 


Theoretical  considerations  and  empirical  results,  however,  questioned  the  general  utilities 
of  traditional  methods  when  data  are  highly  unbalanced  and  variances  are  heterogeneous 
across  environments  (Fernando  et  al.l984).  Given  the  advances  of  statistical  methods  and 
computing  software,  better  approaches  are  necessary  to  overcome  those  inadequacies. 

The  investigation  of  optimal  estimating  methods  of  type  B genetic  correlation  is 
begun  by  developing  a new  univariate  method.  An  univariate  approach  is  theoretically 
developed  based  on  predicted  parental  GCA  effect  (i.e.,  one-half  of  the  predicted  parental 
breeding  values).  The  practical  applications  of  the  new  approach  in  type  B genetic 
correlation  estimation  are  compared  with  existing  univariate  methods  using  computer 
simulated  data  sets  with  various  types  of  data  imbalance  and  genetic  architectures. 

Although  univariate  approaches  may  produce  theoretically  unbiased  estimates  of  type 
B genetic  correlations,  large  sampling  errors  of  variance  component  estimates  for  data  sets 
from  each  individual  environment  often  lead  to  estimates  of  type  B genetic  correlation  being 
out  of  theoretical  parameter  space,  especially  when  heritabilities  of  trait  under  investigation 
are  low  in  one  or  both  of  the  paired  environments.  For  this  reason,  multivariate  methods  are 
exploited  to  put  data  sets  from  different  environments  into  a closed  analytical  system  which 
will  estimate  genetic  variances  and  covariances  simultaneously  and  restrain  the  estimates  of 
type  B genetic  correlations  within  the  theoretical  parameter  space.  Although  restricted 
maximum  likelihood  (REML)  estimation  of  type  B genetic  correlations  from  the  multivariate 
approach  can  constrain  the  estimates  within  the  parameter  space,  the  properties  of 
unbiasedness  and  minimal  variances  of  the  estimates  are  not  theoretically  known.  These  are 
assessed  in  this  study  based  on  computer  simulated  data  with  known  population  parameters. 


CHAPTER  2 

POTENTIAL  BIASES  OF  INCOMPLETE  LINEAR  MODELS 
IN  GENETIC  PARAMETER  ESTIMATION  AND  BREEDING 
VALUE  PREDICTION:  BALANCED  DATA 

Introduction 

Estimating  genetic  parameters  and  predicting  breeding  values  (BVs)  are  among  the 
primary  objectives  of  forest  genetic  tests  in  tree  breeding  programs  (Zobel  and  Talbert  1984; 
White  1987, 1996).  Estimates  of  genetic  parameters  are  important  considerations  in  creating 
breeding  strategies  (Zobel  and  Talbert  1984;  White  et  al.  1993)  and  predicted  breeding  values 
are  the  basis  for  ranking  candidates,  making  selections  and  predicting  genetic  gains 
(Henderson  1973,  1977,  1984;  Falconer  1981;  White  and  Hodge  1989).  Proper  analysis  of 
data  from  forest  genetic  tests  is  necessary  for  obtaining  the  accurate  estimates  of  genetic 
parameters  crucial  for  maximizing  genetic  progress  in  breeding  programs. 

From  quantitative  genetics  theory,  it  is  well  known  that  the  estimation  of  heritability 
and  the  prediction  of  breeding  values  are  directly  affected  by  the  estimation  of  variance 
components,  especially  the  estimates  of  additive  genetic  variances  (Falconer  1981;  Mrode 
1996).  For  a given  set  of  experimental  data,  the  estimates  of  variance  components  are 
affected  by  both  statistical  procedures  (Searle  et  al.  1992)  and  analytical  linear  models  being 
used  (Giertych  and  Van  De  Sype  1990;  Wei  and  van  der  Werf  1993).  To  obtain  unbiased 
estimates  of  variance  components,  forest  genetic  data  analyses  have  traditionally  used  as 
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complete  a linear  model  (i.e.,  full  linear  models)  as  possible  (Campbell  et  al.  1986; 
Bongarten  and  Hanover  1986;  Hodge  and  White  1992;  Adams  et  al.  1994;  Huber  et  al.  1994; 
Dieters  et  al.  1995).  These  full  analytical  linear  models  consider  all  environmental  and 
genetic  effects  that  may  affect  the  measurements  of  the  trait(s)  under  investigation,  so  that 
maximum  information  is  extracted  from  the  data.  However,  for  experimental  data  arising 
from  complex  genetic  structures  and  field  experimental  designs,  full  mixed  linear  models 
become  increasingly  computationally  demanding  as  the  number  of  levels  associated  with  the 
model  term  increases. 

In  recent  practical  application  of  mixed  linear  model  methods  to  forest  genetic 
evaluation,  high  computational  demands  associated  with  large  populations,  as  well  as  other 
analytical  difficulties,  sometime  lead  to  the  use  of  incomplete  mixed  models  with  respect  to 
forest  genetic  experimental  designs,  in  which  either  non-additive  genetic  effects  or/and 
genotype-by-environment  (G  x E)  interactions  are  omitted  from  the  full  mixed  linear  models 
(Jarvis  et  al.  1995;  Araujo  et  al.  1996).  Although  it  is  known  that  ignoring  these  effects  can 
potentially  cause  biases  to  the  estimation  of  additive  genetic  variance,  heritability,  breeding 
values  and  genetic  gains  (Henderson  1985;  Wei  and  van  der  Werf  1993;  Quinton  and  Smith 
1997),  the  potential  risks  of  using  incomplete  mixed  linear  models  in  forest  genetic  data 
analyses  are  not  well  understood.  The  objectives  of  this  study  are:  (1)  to  formulate 
theoretically  the  potential  biases  of  estimates  of  additive  genetic  variances,  heritabilities  and 
predicted  breeding  values  for  balanced  data  resulting  from  the  use  of  incomplete  mixed  linear 
models;  and  (2)  to  demonstrate  the  effects  of  mating  design  and  field  experimental  design 
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Methods 

Theoretical  Formulation  of  Biases  for  Balanced  Data 

It  is  known  that  for  balanced  data,  several  variance  component  estimation  techniques, 
including  ANOVA  based  estimators  (such  as  Henderson’s  type  I,  type  II,  and  type  III)  and 
restricted  maximum  likelihood  estimation  (REML),  yield  identical  solutions  for  variance 
components  for  any  mixed  or  random  linear  model  (Searle  et  al.  1992).  While  REML 
solution  of  variance  component  estimates  requires  the  assumption  of  normality  and  is  an 
iterative  procedure,  ANOVA  based  estimators  are  obtained  straightforwardly  by:  (i)  equating 
the  calculated  mean  squares  (or  sum  of  squares)  to  their  pertinent  expectations,  and  (ii) 
solving  the  set  of  linear  equations  (Henderson  1953;  Searle  et  al.  1992). 

In  this  study,  formulae  for  estimating  biases  in  terms  of  design  variables  and  variance 
components  were  theoretically  derived  based  on  the  ANOVA  approach.  Biases  of  estimates 
of  additive  genetic  variance,  heritability,  and  predicted  parental  breeding  values  were 
formulated  following  the  statistical  relationship  between  a full  and  an  incomplete  mixed 
linear  models  for  a given  experimental  data  set.  This  was  achieved  by  first  estimating 
variance  components  for  a given  experimental  design  based  on  the  expected  mean  squares 
of  a full  mixed  linear  model.  The  estimates  of  additive  genetic  variance  components, 
heritability  and  the  predicted  parental  breeding  values  from  the  full  model  were  thus  regarded 
as  the  unbiased  estimates  or  predictions.  Then,  the  sum  of  squares  (as  well  as  its  associated 
degrees  of  freedoms)  of  ignored  effect(s)  from  the  full  model  was  pooled  into  that  of  the 
appropriate  remaining  effect(s)  in  the  incomplete  model.  This  leads  to  the  reconstruction  of 
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expected  mean  squares  for  effects  retained  in  the  incomplete  mixed  linear  model  and  the  re- 
estimation of  variance  components.  Finally,  the  differences  between  the  full  and  incomplete 
models  in  the  estimates  of  additive  genetic  variance,  heritability,  and  predicted  parental 
breeding  values  were  defined  as  the  theoretical  biases. 

Relationship  Between  Full  and  Incomplete  Linear  Models 

For  analysis  of  variance  with  balanced  data,  detailed  rules  have  been  established  for 
calculating  sum  of  squares,  partitioning  degrees  of  freedom,  and  deriving  expected  mean 
squares  for  each  effect  in  a full  linear  model  (Montgomery  1991;  Searle  et  al.  1992).  Abiding 
by  these  rules,  a full  linear  model  for  an  experiment  with  half-sib  families  tested  in  a 
randomized  complete  block  design  with  multiple-tree  plots  can  be  written  as: 

Yyki=M  + Ei  +/i  + 2- 1 

where 

is  the  observation  of  the  tree  within  the  family  in  /*  block  of  the  f 
environment; 

H is  the  overall  mean; 

is  the  fixed  effect  of  the  environment,  i=l,...,  t; 

By  is  the  fixed  effect  of  the  block  within  the  i'*  environment,  7= l,...,b; 

/i  is  the  random  effect  of  family  effect,/^  ~NID(0,  a^J),  yt=l,...,f ; 

is  the  random  effect  of  family  x environment  interaction, ~NID(0,  Og^^); 

Pijk  is  the  random  effect  of  family  x block  (environment)  interaction  {i.e.,  the  so- 
called  plot  effect),  Py*~NID(0,  Op^); 
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is  the  random  effect  of  /'*  tree  within  the  family  in  the  f block  of  the  f 

environment,  ey^NID(0,  o/),  n. 

Assuming  zero  covariance  among  random  effects,  the  appropriate  sum  of  squares, 
degrees  of  freedom,  and  expected  mean  squares  for  each  effect  are  obtained  in  Table  2-1. 

When  an  incomplete  model  (ignoring  one  or  more  effects  from  the  full  linear  model) 
is  applied  to  the  same  experimental  data,  these  established  rules,  however,  become 
inadequate  because  they  fail  to  clarify  (z)  how  sums  of  squares  and  the  degrees  of  freedom 
associated  with  ignored  effects  will  be  relocated  and  (//)  how  the  relocation  of  sums  of 
squares  and  degrees  of  freedom  of  dropped  effects  may  affect  the  expected  mean  squares  in 
the  incomplete  model.  For  example,  if  the  effect  of  family  x environment  interaction  (/e,^ ) 
is  dropped  from  Eq.  2-1,  the  incomplete  model  then  becomes: 

Yijkl=  k + +fk  + Pijk  +^ijkl  2-2 

and  the  correct  ANOVA  table  cannot  be  obtained  using  the  rules  set  out  for  full  linear 
models. 

An  effective  way  to  determine  the  relocation  of  sums  of  squares  in  a incomplete 
linear  model  is  to  examine  the  structures  of  sums  of  squares  and  expected  mean  squares 
calculated  under  the  full  model.  When  an  effect  is  assumed  unimportant  and  consequently 
dropped  from  a full  linear  model,  it  essentially  implies  that  observed  variation  due  to  the 
dropped  effect  is  assumed  to  be  zero.  Specifically,  in  the  above  incomplete  model,  when  the 
effect  of  family  x environment  interaction  ) is  dropped  from  the  full  model,  it  assumes 
thatojg=0. 
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lab.le  2-1.  Sum  of  squares  and  expected  mean  squares  for  the  full  and  incomplete  mixed  linear  models  in  an 

analysis  of  half-sib  families  tested  with  a randomized  complete  block  design  and  multiple-tree  plots. 

Source  of  df  Sum  of  Squares  Expected  Mean  Squares 

Variation 

Full  Model 


t 


Envir.(E) 

t-1 

/=1 

t b 

Block(B/E) 

t(b-l) 

SSsis^fn  E E iy  -y.  f 
/=iy=i 

/ 

SSp=tbn  E (y  ^ -y  f 

t f 

Famly(F) 

f-1 

2 2 2 2 
Og  +nap+bnaj^+tbnOj^ 

FxE 

(t-l)(f-l) 

SSF,E=bn  E E * +y  f 

i=\k=\ 

t b f 

a]+nal+bna}^ 

F X B/E 

t(b-l)(f-l) 

^^FxB!E=^  E E E iytjE-yij.  -yi.k.^yi 

,=  ly  = U=l 

t b f n 

a]+nal 

Residual 

tbf(n-l) 

^^Res=  5:  E E E 

/=ly=U=l/=l 
t b f n 

o] 

Total 

tbfh-1 

i=\j=\k=\l=\ 

Incomplete  model 
t 


Envir.(E) 

t-1 

^^Envir  =bfn  E (y,  ,-y  f 
/=1 

t b 

Block(B/E) 

t(b-l) 

SSB/E=fn  £ 5^  (y,j  -y,  f 
i=lj=l 

f 

SSp^tbri  E (y  ^ -y  Y 
*=1 

t b f 

Famly(F) 

f-1 

<^ynal,+tbnaj. 

F X B/E 

(tb-l)(f-l) 

^^FxB/E=^^  ^ ^ (yijk.-yij.  -y..k.*yy 
i=ij=ik=\ 

t b f n 

Residual 

tbf(n-l) 

^ ^ ^(yijki~yijkT 

o'. 

/=V=u=i/=i 


t b f n 

f=iy=lA:=l/=l 


Total 


tbfh-1 
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After  assuming  variance  component  = 0 in  the  expected  mean  squares  of  the  full 
linear  model,  it  is  quickly  found  that  family  x environment  interaction  (F  x E)  would  have 
the  same  expected  mean  squares  as  that  of  family  x block(environment)  interaction  (i.e.,  F 
X B/E).  Therefore,  when  the  effect  of  family  x environment  interaction  is  dropped  from  the 
full  model,  its  associated  sum  of  squares  and  degrees  of  freedom  are  likely  to  be  pooled  into 
the  effect  of  family  x block(environment)  interaction  which,  in  this  case,  is  confirmed 
because 

^ b f 

i=lj  = \k=\ 

t b f 

=n  E E E [0..^  -y.  -y  ^ +y  ) -(y.^ -y.  -y^ 
i=\j=\k=l 

t b f t f 

=«E  E ^ {yijk-yij.-y..k.^y...f-bn^  '^(y^-yi  -yk^y  f-  2-3 
/=17  = 1A:=1  i = \k=\ 

t f 

Clearly,  when  SSp.^^=bn  E E (y.  ^ -y.  -y  ^ +y  is  not  deducted  in  the  calculated 

i=\k-\ 

sum  of  squares  for  the  family  x block  (environment)  interaction  in  the  incomplete  model,  the 
sum  of  squares  for  the  family  x block  (environment)  interaction  in  the  incomplete  model  is 
increased  by  exactly  the  same  amount  as  that  of  the  effect  of  family  x environment 
interaction  in  the  full  model. 

Now,  consider  the  effect  of  the  relocation  of  sum  of  squares  of  an  ignored  effect  on 
the  expected  mean  square  of  the  recipient  effect.  It  can  be  shown  that  in  the  above 
incomplete  model,  the  expected  sum  of  squares  for  the  recipient  effect  [i.e.,  family  x 
block(environment)  interaction  effect]  is: 
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t b f 


FxBtE‘ 


/ = iy=lA:=l 


=(/"l)(^^~l)(Oe,+«o^,)  (see  Appendix  2-1) 


2-4 


Thus,  £(MSv,u»^)=-^^q^^£(SSv.»£)=o;.*«aJ.  . 


2-5 


Note  that  if-\){tb-\)-{t-\){f-\)+t{b-\)(f-\),  which  further  suggests  that  the 
degrees  of  freedom  associated  with  family  x environment  interaction  are  also  pooled  into  the 
family  x block  (environment)  interaction  in  the  incomplete  model.  Comparing  the  expected 
mean  square  in  the  incomplete  model  (Eq.2-5)  with  that  from  the  full  model  [i.e., 


that  they  are  nearly  structurally  identical.  In  fact,  the  variance  components  in  Eq.2-5  are 
generally  different  from  their  counterparts  in  the  full  model  because  the  calculated  mean 
square  in  the  incomplete  model  is  not  necessarily  equal  to  that  from  the  full  model  for  the 
family  x block(environment)  interaction  after  the  pooling  of  sum  of  squares  and  degrees  of 
freedom. 

Further,  consider  the  effect  of  incomplete  models  on  the  expected  mean  square  of  an 
effect  that  is  not  directly  affected  by  the  relocation  of  sum  of  squares  and  degrees  of  freedom. 
For  example,  the  sum  of  squares  for  the  family  effect  in  the  incomplete  model  is  calculated 
in  the  same  way  as  that  in  the  full  model  (Table  2-1),  but 


2 2 

^(^^FxB/e)  for  the  family  x block  (environment)  interaction  (Table  2- 1 ),  it  appears 


/ 

}=E[bnt  E iy j,-yy] 


k=\ 


(see  Appendix  2-2), 


2-6 
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and  E(MS*^^.^)  =-^E(SSy^^.,^)=al  ^bntoj, . 2-7 

Comparing  Eq.  2-7  with  the  expected  mean  square  from  the  full  model  for  the  family 
effect  [i.e.,  al+nOp+bnojg+bntaj],  the  expected  mean  square  from  the 

incomplete  model  is  not  exactly  the  same  (see  Table  2-1).  The  variance  component  for  the 
family  x environment  interaction  has  disappeared  in  the  incomplete  model  and,  thus,  the 
expected  variance  components  for  other  effects  are  different  from  their  coimterparts  in  the 
full  model.  Nevertheless,  numerically,  the  two  equations  of  expected  mean  squares  for  the 
family  effect  are  equal  because  they  are  expectations  of  the  same  calculated  mean  square, 
which  is  not  affected  by  the  elimination  of  the  fe^i^  term  in  the  incomplete  model. 

As  indicated  with  these  examples,  it  can  be  proven  that  the  following  relationships 
are  generally  held  between  an  incomplete  and  its  full  linear  models  for  balanced  data:  (1)  for 
a given  effect  dropped  from  a full  linear  model,  its  sum  of  squares  and  the  associated  degrees 
of  freedom  are  pooled  into  an  effect  which,  often  being  either  an  interaction  term  or  a nested 
effect,  links  directly  to  the  dropped  effect  (i.e.,  in  matrix  expression,  the  incidence  matrix  of 
the  recipient  effect  spans  the  incidence  matrix  of  the  dropped  effect);  (2)  for  any  effect  in  a 
incomplete  model,  its  expected  mean  square  resembles  that  in  a full  model  except  that  (/)  the 
expected  variance  components  are  generally  different  in  interpretation  and  value  from  their 
counterparts  in  the  full  model  and  (if)  the  expected  variance  component  of  the  dropped  effect 
does  not  exist;  and  (3)  for  an  effect  whose  sum  of  squares  is  not  affected  by  the  relocation 
of  sums  of  squares  and  degrees  of  freedom  in  a incomplete  model,  its  calculated  mean  square 
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from  the  incomplete  model  is  numerically  equal  to  that  from  a full  model,  however,  the 
expectations  of  the  mean  square  from  the  full  and  incomplete  models  are  structurally 
different  in  terms  of  the  variance  components  involved. 

Mating  and  Field  Experimental  Designs 

Design  scenarios  considered  in  this  study  included  half-sib  and  full-sib  progenies 
tested  in  randomized  complete  block  (RGB)  experimental  designs.  The  mating  design  for 
creating  full-sib  progenies  was  the  half-diallel  mating  design  which  have  been  widely  used 
in  tree  breeding  (Namkoong  and  Roberds  1974;  Zobel  and  Talbert  1984;  Foster  and 
Bridgwater  1986;  Burdon  and  Van  Buijtenen  1990;  Huber  et  al.  1992;  Li  et  al.  1996;  White 
1996).  Half-sib  progenies  are  assumed  from  a polymix  mating  design,  which  is  also  widely 
used  in  tree  breeding  programs  (Zobel  and  Talbert  1984). 

Results 

Design  variables  of  an  experiment  are  denoted  by:  p,  the  number  of  parents  used  in 
a full-sib  mating  design;  k,  the  average  number  of  crosses  per  parent;  t,  the  number  of  testing 
locations  in  the  field  experimental  design;  and  b,  the  number  of  blocks  within  a location. 
True  variance  components  were  denoted  by:  one-quarter  of  additive  genetic  variance 

(O;  one-quarter  of  additive  genetic  effect  x environment  interaction;  one-quarter 

of  dominance  variance;  aj,  one-quarter  of  dominance  x environment  interaction;  and  a/, 
residual  variance  component.  In  all  cases,  A stands  for  bias  and  is  the  full  model 


predicted  parental  breeding  values. 


Case  Study 


Case  I.  Half-sib  families  tested  with  RCB  designs  and  single-tree  plots 

Suppose  that  trees  of  half-sib  families  are  tested  over  multiple  environment  with  a 
randomized  complete  block  (RCB)  field  experimental  design  and  single-tree  plots  in  each 
environment.  A full  analytical  model  is: 

Yy,=  ^i  + +R.  +/,  + fe,,  + ey,  2-8 

where  /i,  By  f^  have  the  same  definition  as  in  Eq.2-1,  and  is  the  random  effect 

of  residual,  e,^^~NID(0,  a/),  where  a/  + o/  from  2-1. 

By  the  rules  set  forth  for  full  models,  an  ANOVA  table  is  given  in  Table  2-2. 

For  the  same  data,  an  incomplete  model  ignoring  family  x environment  interaction 
is:  + +By  +/^  + Cy^ , 2-9 

which  has  the  same  definitions  of  elements  as  in  Eq.2-8. 

By  the  relationship  between  full  and  incomplete  linear  models  for  balanced  data,  it 
is  known  that  when  the  effect  of  family  x environment  interaction  is  dropped,  its  associated 
sum  of  squares  and  degrees  of  freedom  are  pooled  into  the  error  term.  Thus,  the  estimated 
variance  component  for  the  error  term  from  the  incomplete  model  is 


which  indicates  a bias,  as  compared  with  the  estimate  from  full  model,  of  the  magnitude: 
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Table  2-2.  Sum  of  squares  and  expected  mean  squares  for  the  full  and  incomplete  models  in  an 


Source  of 
Variation 

df 

SS 

Full  Model 

EMS 

Env.(E) 

t-1 

i=\ 
t b 

Block  (B/E) 

t(b-l) 

/=iy=i 

Famly  (F) 

f-1 

SSp^tb  E 0 i^-y  f 
k=\ 
t f 

al+baj^+tbaj 

FxE 

(t-l)(f-l) 

SSp.,E=b  E E (y,t-T,„-T.*+T.)" 
z=U=l 
t b f 

o]^bal 

residual 

t(b-l)(f-l) 

SS  = E E E 

/=ly=U=l 

/ b f 

o] 

Total 

tbf-1 

i=\j=\k=\ 

Incomplete  model 

Env.  (E) 

t-1 

i=\ 
t b 

Block  (B/E) 

t(b-l) 

SSaiE^fT.  T.{y  -y.f 
/=ly  = l 

/ 

SSp^tb  E 0 ^-y  f 
k=\ 

t b f 

Famly  (F) 

f-1 

y.+tbaj. 

Residual 

(tb-l)(f-l) 

SSr  5:  s E 
/=V=u=i 
^ b f 

Total 

tbf-1 

^^,o,ar  5^  £ s (y,jk-y.f 
i=ij=\k=i 

Note  that  the  numbers  of  levels  for  environment,  block  within  environment,  family,  and  trees  within  a plot  are 
noticed  by  t,  b,  /,  and  n,  respectively. 
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From  Eq.2-1 1,  it  is  known  that  the  estimate  of  error  variance  is  biased  unless  t=l  and 
b>l  (i.e.,  for  single  site  data  with  RGB  design).  Eq.2-1 1 is  not  defined  when  t =b=l  since 
such  an  experiment  is  no  longer  statistically  valid. 

The  calculated  mean  square  related  to  the  family  effect  is  not  affected  by  the 

relocation  of  sum  of  squares  and  degrees  of  freedom  in  the  incomplete  model,  thus 
2 2 2 2 2 

Og,  +tbOj^  =Og  +bOg^  +tbOj , (see  family  effects  in  the  full  and  incomplete 


models  in  Table  2-2) 


and 


2 2 ^e* 

r 

^ tb 


=o^+- 


""  tb-\  “ 
tb 


"0/  + 


b-\  2 
tb-\ 


2-12 


Therefore,  comparing  with  the  estimate  of  variance  component  for  the  family  effect 
from  the  full  model,  the  bias  for  the  estimate  of  family  variance  component  from  the 

incomplete  model  is:  Aoy=  ^ , 2-13 

which  is  a function  of  field  experimental  design  and  the  true  variance  component  of  the 
dropped  effect.  Equation  2-13  indicates  that  for  the  special  case  with  t=l  and  b>l,  the 
estimated  family  variance  component  from  the  incomplete  model  will  be  biased  upward  with 
the  magmtude  of  o^.  It  is  also  worthwhile  noting  that  with  this  special  experimental  design, 
the  sum  of  the  biases  equals  the  variance  of  the  family  x environment  interaction,  i.  e., 


.2.2  6-1  2 t(b-\)  2 2 

Aa„.Ao..=— 
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Assuming  that  variance  component  of  half-sib  families  estimates  1/4  of  total  additive 
genetic  variance  (Falconer  1981),  the  bias  for  heritability  estimate  from  the  incomplete 
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b-\  2 


model  is:  =A^ap{a}^  +o^J  =4(-^op/(oj +a^) 

tb-X  ,2’ 

and  the  bias  for  predicted  parental  breeding  values  is 


2-15 


/• 


2-16 


Case  II.  Half-sib  families  tested  with  RGB  designs  and  multiple-tree  plots 

For  this  experimental  design,  the  full  and  incomplete  linear  models  are  respectively 
given  in  Eq.2-1  and  Eq.2-2  and  their  ANOVA  tables  are  given  in  Table  2-1.  Again,  by  the 
relationship  between  full  and  incomplete  linear  models,  it  is  known  that  in  the  incomplete 
model,  the  sum  of  squares  and  the  degrees  of  freedom  associated  with  the  family  x 
environment  interaction  are  pooled  into  the  family  x block(environment)  effect,  thus  it  can 
be  derived  that  the  theoretical  estimates  of  variance  components  for  the  error  term,  family 
X block  (environment)interaction  and  family  effect  in  the  incomplete  model  are,  respectively, 


2 2 


2 2 

tb-\ 


, 22  b-\  2 

and  cy,  =oy +— — o 
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2-18 


2-19 


tb-\ 

By  the  same  assumption  as  in  Eq.2-1 5 {i.e.,  o^=4oy  ),  the  bias  for  heritability 
estimate  from  the  incomplete  model  for  this  experimental  design  is 


^ =4(A^o^J/(oj + oj+oj) , 


2-20 
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which  is  biased  if 
breeding  values  is: 


Ogg  is  not  zero.  Like  that  in  the  case 


('i-l)o' 

gca 


1,  the  bias  for  predicted  parental 

2-21 


The  estimate  of  phenotypic  variance  from  the  incomplete  model  with  this 
experimental  design  is  unbiased  since  o^;^„=(aj+Og^+a^+Og). 

Case  III.  Full-sib  families  tested  with  RCB  designs  and  single-tree  plots 

Suppose  trees  of  full-sib  families  from  a half-diallel  (or  circular  mating)  mating 
design  are  tested  in  the  field  with  a randomized  complete  block  (RCB)  experimental  design 
and  single-tree  plots.  A complete  analytical  model  is  (Huber  et  al.  1992;  1994): 

yyki  = + Ei  + By  + g^  + g,  + ge,^  + ge„  +s,,,  + + ^yki  2-22 

where  is  the  observation  of  the  kl'^  cross  in  the /*  block  of  the  i"'  test; 
fi  is  the  overall  mean; 

Ei  is  the  fixed  effect  of  the  /'*  environment; 

By  is  the  fixed  effect  of  the block  within  the  /'*  environment; 

is  the  random  effect  of  female  general  eombining  ability  (gca),  g/^  ~NID(0, 
gCy^  is  the  random  effect  of  female  gca  x environment  interaction,  ge,*  ~NID(0,  Og^^); 
g,  is  the  random  effect  of  /'*  male  gca,  g,  ~NID(0,  Og^g^); 

g€y  is  the  random  effeet  of  male  gca  x environment  interaction,  ge,,  ~NID(0,  Og^^); 
% is  the  random  effect  of  specific  combining  ability  (sea)  between  the  female  and 
/'*  male,  5y~NID(0,  o^)\ 

seyy  is  the  random  effect  of  sea  xenvironment  interaction,  5e^y~NID(0,  Os^^);  and 
is  the  random  effect  of  kl"'  cross  in /*  block  within  i"’  location,  e,y^NID(0,  o^^). 
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Assuming  no  reciprocal  effects  of  parents  and  no  covariance  between  any  pairs  of 
random  variables  in  the  model,  an  ANOVA  table  for  the  full  and  some  incomplete  linear 
models  is  given  in  Table  2-3.  For  this  full-sib  mating  design,  four  potential  incomplete  mixed 
linear  models  are  discussed  below. 

SCA  X environment  interactions  ignored 

In  a full  model  analysis  of  fiill-sib  data,  the  effect  of  SCA  x environment  interaction 
generally  accounts  for  a large  proportion  of  the  degrees  of  freedom  in  the  analytical  model. 
Consequently,  the  computational  demands  can  be  substantially  reduced  if  this  effect  is 
ignored.  Based  on  the  relationship  between  the  full  and  incomplete  mixed  linear  models,  it 
is  known  that  when  SCA  x environment  interaction  is  ignored  from  the  full  mixed  linear 
model,  its  degrees  of  freedom  and  sum  of  squares  are  pooled  into  the  error  term.  The 
potential  biases  to  the  estimates  of  variance  components,  heritability,  and  predicted  parental 
breeding  values  with  such  an  incomplete  model  thus  can  be  calculated  as: 


Hp-2)\p{t-\){k-2)^t{b-\){pk-2)f‘ 


-ip-\){b-\){pk-2) 


2-23 


2-24 


where  m = 


p{t-m-2)H{b-\)(pk-2y 


and  A 


2-25 


Table  2-3.  Expected  mean  squares  for  random  effects  in  the  full  model  (a),  incomplete  model  ignoring  GCA  x E & SCA  x E (b),  incomplete  model  ignoring  SCA 
& SCA  X E (c),  and  incomplete  model  ignoring  SCA,  GCA  x E,  and  SCA  x E (d)  in  the  analysis  of  full-sib  families  tested  with  randomized  complete  block  design, 

and  single-tree  plots. 

Source  df  MS  EMS  Source  df  MS  EMS 
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Note:  k is  the  number  of  crosses  per  patent. 
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Apparently,  this  incomplete  linear  model  results  in  downward  biases  to  the  estimates 
of  additive  genetic  variance  and  heritability.  Accordingly,  the  predicted  breeding  values  are 
more  shrunken  than  those  from  the  full  analytical  model. 

SC  A X environment  and  GCA  x environment  interactions  ignored 

The  incomplete  linear  model  ignoring  SCA  x environment  and  GCA  x environment 
interactions  is 


yijkl  - +&  e 


ijkl  » 


2-26 


which  has  the  same  definitions  of  elements  as  in  Eq.2-22.  ANOVA  table  for  this  incomplete 
model  is  given  in  Table  2-3. 

By  the  relationships  between  full  and  incomplete  linear  models  for  balanced  data,  the 
sums  of  squares  of  dropped  effects  in  this  incomplete  model  are  pooled  into  the  error  term, 
thus  it  can  be  shown  that  the  variance  component  estimates  for  the  error  term,  SCA  effect 
and  GCA  effect , as  compared  with  those  from  the  full  model,  are,  respectively. 


2 2 6(t-l)  2 {t-\)bk(p-2)  2 


,b-i  - 


2-27 


and 


2 2 1 2 
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Assuming  that  fiill-sib  family  variance  component  estimates  one-half  of  total  additive 
genetic  variance  (Falconer  1981),  heritability  estimate  from  this  incomplete  model  is: 


h*^=- 


4a 


gca 


4(o^  +—a^ ) 

V gca  ^ ^gef 
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and  the  bias  for  predicted  parental  breeding  value  is: 


2-30 


Note  that,  in  this  experiment,  the  estimate  of  phenotypic  variance  component  (the 
denominator  in  Eq.2-29)  is  also  biased  as  compared  with  that  from  the  full  model  {i.e.. 


SCA  and  SCA  x environment  interaction  ignored 

The  incomplete  model  dropping  SCA  and  SCA  x environment  interaction  is 


which  has  the  same  definitions  of  elements  as  in  Eq.2-22  and  ANOVA  table  is  given  in 
Table  2-3.  Because  the  sums  of  squares  of  SCA  and  SCA  x environment  interaction  are  also 
pooled  into  the  error  term  in  the  incomplete  model,  it  thus  can  be  shown  that  the  estimates 
of  variance  components  for  the  error  term,  GCA  x environment  interaction  and  GCA  effect 
are,  respectively. 


yijki  - + Bij  + gk  + gi  + ge^k+  gen  +eyki , 
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Equation  2-34  indicates  that  when  SCA  and  SCA  x environment  interaction  are 
excluded  from  the  incomplete  model,  bias  to  the  estimate  of  additive  genetic  variance 
component  is  mostly  affected  by  the  number  of  crosses  per  parent. 

With  the  same  assumptions  as  in  Eq.2-29,  the  heritability  estimate  with  this 
incomplete  linear  model  is: 
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where  m=- 
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^{b-\)(p-\)(pk-2) 

Hp-2)  2 


The  bias  for  predicted  parental  breeding  value  is 
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From  Eq.2-36,  it  is  clear  that  the  estimates  of  additive  genetic  variance,  heritability, 
and  phenotypic  variance  from  this  incomplete  model  are  all  biased  as  compared  with  those 
from  the  full  model. 

Purely  additive  genetic  model 

For  the  purely  additive  genetic  model,  SCA,  GCA  x environment  interaction,  and 
SCA  X environment  interaction  are  all  ignored.  This  is  the  most  incomplete  model  that  would 
be  used  for  the  analysis  of  full-sib  progeny  test  data  for  parental  values: 

yijki  ^ E,  + + + g,  +e,ju , 2-37 

Equation  2-37  has  the  same  definitions  of  elements  as  in  Eq.  2-22.  Its  ANOVA  table  is  also 


given  in  Table  2-3. 
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Since  the  sums  of  squares  of  the  dropped  effects  in  this  incomplete  model  are 
eventually  pooled  into  the  sum  of  squares  of  error  term,  it  can  be  derived  that  the  estimates 
of  variance  components  for  the  error  term  and  the  GCA  effect  are,  respectively, 
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Thus  heritability  estimate  from  this  incomplete  model  is: 
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and  the  bias  for  predicted  parental  breeding  values  is  A„„= 
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It  is  obvious  that  o ^ , o and  h * are  all  bieised  as  compared  with  those  estimates 


gca 


from  the  full  linear  model,  and  so  is  the  estimate  of  phenotypic  variance. 


Discussion 


Theoretical  consideration  of  biases  for  estimates  of  genetic  parameters  and  predicted 
parental  breeding  values  from  incomplete  mixed  linear  models  confirms  some  previous 
empirical  evidence  (Wei  and  van  der  Werf  1992;  Quinton  and  Smith  1997).  As  compared 
with  full  models,  incomplete  mixed  linear  models  cause  biases  in  variance  component  and 
heritability  estimates,  which  result  in  biased  predicted  parental  breeding  values,  as  long  as 
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the  true  variance  components  of  dropped  effects  are  not  zero,  regardless  of  the  dropped 
effects  being  random  or  fixed  effects.  The  direction  of  biases  to  estimates  of  additive  genetic 
variance  and  heritability  can  be  downward  or  upward,  depending  on  the  specific  progeny  test 
and  the  incomplete  linear  models  being  used.  While  ignoring  the  effect  of  SCA  x 
environment  interaction  causes  slight  under-estimation  of  additive  genetic  variance  and 
heritability,  for  all  other  incomplete  linear  models  the  biases  are  positive,  indicating  the  risk 
of  upward  biases  in  the  estimation  of  genetic  parameters  if  those  incomplete  linear  models 
are  used.  These  upward  biases  in  additive  genetic  variance  and  heritability  estimates  result 
in  the  larger  spread  (i.e.,  variance)  among  predicted  parental  breeding  values,  which  in  turn 
results  in  the  proportional  upward  biases  of  predicted  genetic  gains. 

Population  genetic  architecture  [parameters  such  as  heritability  (h^),  type-B  genetic 
correlation  (rg)  and  the  ratio  dominance  variance  to  additive  variance  (y)]  would  affect  the 
magnitudes  of  biases  to  additive  genetic  variance  and  heritability  estimates  because  it 
determines  the  magnitudes  variance  components  for  dropped  effects.  For  example,  the 
theoretical  projection  for  balanced  data  (Figure  2-la  & c)  indicates  that  when  the  population 
true  heritability  is  high  and  non-additive  genetic  control  is  weak  (such  as  genetic  architecture 
3 in  Table  2-4),  the  relative  biases  to  heritability  estimates  by  incomplete  linear  models  are 
comparatively  small.  Whereas,  when  true  heritability  is  low  and  non-additive  genetic  effects 
are  strong  (such  as  genetic  architecture  1 and  2 in  Table  2-4),  the  relative  biases  to 
heritability  estimates  by  the  incomplete  models  can  be  substantial  (Figure  2-1). 

When  all  G X E interactions  are  excluded  from  the  incomplete  linear  model,  estimates 
of  additive  genetic  variance  and  heritability  are  mainly  affected  by  the  number  of 
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environments  in  which  each  family  is  tested  (Figure  2-la  & b).  The  relative  biases  to 
heritability  estimates  from  the  incomplete  linear  models  are  dramatically  increased  when  the 
number  of  environments  is  fewer  than  5,  regardless  of  the  population  genetic  structures  and 
the  number  of  families  involved.  The  extreme  example  of  such  an  incomplete  linear  model 
is  the  single-site  genetic  tests,  in  which  the  estimate  of  additive  genetic  variance  is  inflated 
by  the  whole  amount  of  variance  component  associated  with  the  additive  genetic  effect  x 
environment  interaction. 

For  full-sib  families  tested  with  RGB  designs  and  single-tree  plots,  if  the  SCA  effect 
and  its  interaction  with  environment  are  excluded  from  a full  linear  model,  biases  to  the 
estimates  of  additive  genetic  variance  and  heritability  are  mainly  affected  by  the  average 
number  of  crosses  that  each  parent  has  in  the  experiment,  but  they  are  free  of  the  effect  of 
the  number  of  environments.  Biases  become  increasingly  larger  when  the  number  of  crosses 
per  parent  is  fewer  than  6 (Figurel-lc).  When  both  G x E interactions  and  SCA  effects  are 
dropped  from  a linear  model  for  fiill-sib  families  tested  with  RGB  designs  of  single-tree 
plots,  biases  to  the  estimates  of  additive  genetic  effect  and  heritability  tend  to  behave  in  a 
cumulative  manner.  Figure- Id  indicates  that  when  both  the  number  of  environments  and  the 
numbers  of  crosses  per  parent  are  small  (<4)  in  an  experiment,  the  percentage  bias  to 
heritability  estimate  from  Model  V can  be  as  high  as  60%  if  the  true  heritability  is  around 
0.1. 

When  incomplete  linear  models  must  be  used,  the  most  desirable  choice  seems  to  be 
the  one  that  ignores  SCA  x environment  interaction.  Several  reasons  support  this  preference, 
including:  (1)  the  degrees  of  freedom  associated  with  the  model  term  in  data  analysis  are 
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Figure  2-1.  Effects  of  population  genetic  architecture,  mating  design,  and  field  experimental  design  on  the 
biases  of  estimates  of  heritability  from  incomplete  linear  models.  Note:  Three  levels  of  genetic 
architectures  are  represented  by  different  levels  of  heritabilites,  i.e.,  for  half-sib  families,  h^  =0.1  (a^gca 
.25,  a^ge=0.1667,  a\h=10),  h^=0.2  (CT\ca=0.50,  o\e=0.1667,  cr^ph=10),  and  h^=0.3  (cr\ca=0.75,  a\e 
.0883,CT^ph=10).  For  full-sib  families,  h^=0.1  (CT^gca=0.25,  CT\ca=0.25,  cr^ge=  0.1667,  or^se  = 0.1667,  a‘ph 
-0),h^=0.2(a^goa=0.50,a«^=0.25,CT\e=0.1667,a\e=0.0833,  a^ph=10),  andh^=0.3  (a^gca=0.75, a\ca 
=0. 1875,  CT^ge=  0.0833,  0.0208,  o^ph=10). 
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reduced  dramatically  in  the  incomplete  linear  models,  which  greatly  reduce  computational 
demands  for  memory  size  during  data  analysis;  (2)  the  magnitude  of  downward  biases  for 
heritability  and  predicted  genetic  gains  are  several-fold  smaller  than  those  of  biases  from 
dropping  other  effects;  and  (3)  in  reality,  the  true  variance  component  for  SC  A x 
environment  interaction  is  likely  to  be  small. 

In  addition  to  biased  variance  component  estimates  of  additive  genetic  and  other 
random  effects,  incomplete  linear  models  also  cause  biased  estimates  of  phenotypic 
variances.  For  the  hypothetical  cases  discussed  in  this  study,  although  the  half-sib  families 
yielded  unbiased  estimates  of  phenotypic  variance  components  when  G x E interaction  was 
dropped  from  the  full  linear  model,  four  incomplete  linear  models  of  the  full-sib  experiments 
yielded  biased  estimates  of  phenotypic  variances  (as  shown  in  Eq.  2-7b  to  2-10b).The 
direction  of  biases  to  the  estimates  of  phenotypic  variance  can  be  positive  or  negative, 
depending  on  the  specific  incomplete  linear  model  being  used.  However,  the  magnitude  of 
these  biases  to  phenotypic  variance  component  estimates  are  small,  with  negligible  effects 
on  the  biases  to  heritability  estimates. 


Conclusion 

ANOVA  relationships  between  full  and  incomplete  linear  models  for  balanced  data 
provide  a useful  tool  to  study  the  mechanisms  causing  biases  in  variance  component 
estimation  resulting  from  the  use  of  incomplete  mixed  linear  models.  When  elements  of 
mating  design  and  field  experimental  design  are  known  and  information  about  the 
magnitudes  of  variance  components,  such  as  that  for  SCA  and  G x E interaction,  is  available. 
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biases  to  the  estimates  of  additive  genetic  variance  and  heritability  from  incomplete  models 
can  be  precisely  appraised.  While  the  formulae  derived  from  this  study  are  for  balanced  data, 
good  approximations  of  biases  in  the  estimates  of  additive  genetic  variance,  heritability,  and 
predicted  parental  breeding  values  may  be  obtained  for  unbalanced  data  by  substituting  the 
average  values  of  design  variables  in  the  formulae.  The  ‘goodness  of  fit’  for  such 
approximation,  however,  needs  to  be  examined  with  different  forest  genetic  scenarios  before 
they  can  be  applied  confidently. 


CHAPTER  3 

POTENTIAL  BIASES  OF  INCOMPLETE  LINEAR  MODELS 
IN  GENETIC  PARAMETER  ESTIMATION  AND  BREEDING 
VALUE  PREDICTION:  UNBALANCED  DATA 

Introduction 

Assessment  of  biases  based  on  balanced  data  provides  useful  information  on  the  risk 
and  acceptability  of  different  incomplete  mixed  linear  models  in  forest  genetie  data  analysis. 
However,  forest  genetie  testing  data  are  mostly  unbalaneed  due  to  various  reasons  (Huber 
et  al.  1994).  For  both  half-sib  and  full-sib  genetic  testing  data,  imbalance  can  be  eaused  by 
missing  cells,  missing  observations  within  a cell  or  both  (White  and  Hodge  1989;  Huber 
1993;  White  1996).  Although  closed  formulae  of  biases  can  be  derived  in  terms  of  design 
parameters  and  variance  components  for  balanced  data,  no  such  form,  however,  can  be 
derived  for  unbalanced  data  (Searle  et  al.  1 992).  Because  unbalaneed  data  structures  diminish 
orthogonality  among  experimental  factors,  theoretical  formulae  of  biases  derived  based  on 
the  assumption  of  orthogonality  may  beeome  increasingly  inaccurate  when  data  are  getting 
increasingly  unbalanced.  In  addition,  the  more  desirable  estimation  methods  of  varianee 
eomponents  for  unbalanced  data  are  those  restricted  maximum  likelihood  (REML) 
approaches  rather  than  the  ANOVA  based  methods  (Searle  et  al.  1992;  Huber  et  al.  1994), 
which  may  further  increase  the  discrepancy  between  the  theoretically  calculated  biases  based 
on  balanced  data  and  the  real  biases  from  unbalanced  data.  It  is  thus  unclear  how  well  the 


31 


32 


derived  formulae  based  on  balanced  data  can  be  used  to  approximate  the  biases  for 
unbalanced  data  and  uncertainty  remains  about  the  potential  magnitudes  of  biases  from 
incomplete  models  when  data  are  highly  unbalanced. 

The  biases  in  genetic  parameter  estimation  and  breeding  value  prediction  from 
incomplete  linear  models  are  not  readily  detectable  with  real  experimental  data  since  the  true 
population  parameters  are  never  exactly  known.  Simulated  experimental  data  with  known 
population  genetic  parameters  and  different  data  structures  can  allow  the  biases  to  be 
precisely  investigated  under  various  circumstances,  which  may  in  turn  provide  useful  hints 
to  the  seriousness  of  the  biases.  In  this  study,  we  analyze  simulated  forest  genetic  testing  data 
of  different  mating  designs,  different  population  genetic  architectures  (i.e.,  relative 
magnitudes  of  G x E interaction  and  dominance  variance)  and  varying  data  imbalance 
using  the  appropriate  REML  approaches,  aiming  to:  (1)  investigate  the  potential  biases  of 
incomplete  mixed  linear  models  in  estimating  heritabilities  and  predicting  of  genetic  gains 
for  various  unbalanced  data  when  dominance  effects  or/and  G x E interactions  actually 
existed  in  non-inbred  populations;  (2)  to  evaluate  the  feasibility  of  using  those  formulae 
derived  based  on  balanced  data  to  approximate  the  biases  with  unbalanced  data  by  using  the 
average  design  parameters  in  the  experiments. 

Methods 

Mating  and  Field  Experimental  Designs 

To  avoid  the  confounding  effects  of  inbreeding  on  the  estimation  of  additive  genetic 
variance  (de  Boer  and  van  Arendonk  1992;  de  Boer  and  Hoeschele  1993;  Hardner  et 
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al.l996),  large  random  mating  and  non-inbred  populations  were  assumed.  Half-diallel, 
circular  and  polymix  mating  designs  were  used  in  this  study  because  they  are  among  the 
most  commonly  used  mating  designs  in  tree  breeding  (Namkoong  and  Roberds  1 974;  Zobel 
and  Talbert  1984;  Burdon  and  van  Buijtenen  1990;  White  et  al.  1993).  Both  half-diallel  and 
circular  mating  designs  are  fiill-sib  mating  designs  and  therefore  additive  and  dominance 
genetic  variances  as  well  as  G x E interaction  variances  may  be  estimated.  The  polymix 
mating  design  represents  a half-sib  mating  design  with  only  estimates  of  additive  genetic 
variance  and  G x E interaction  variance  available.  For  the  half-diallel  design,  each  simulated 
experiment  contained  15  randomly  chosen  parents  from  a large  parental  population, 
producing  105  crosses  (14  crosses  per  parent).  For  the  circular  mating  design,  52  randomly 
chosen  parents  were  included  in  each  experiment  with  104  crosses  produced  (4  crosses  per 
parent,  i.e.,  1x2,  Ix3,2x3,2x4,...,  52x1,52x2).  For  the  polymix  mating  design,  105  half-sib 
families  were  simulated  for  each  experiment.  For  each  of  the  mating  designs,  500  randomly 
simulated  experiments  were  investigated  for  each  pertinent  level  of  the  genetic  architecture 
and  data  imbalance. 

The  field  experimental  design  was  maintained  as  a randomized  complete  block 
design  with  single-tree  plots  (one  individual  per  family  in  each  block).  Families  produced 
in  each  experiment  were  assumed  to  be  tested  over  4 locations  with  15  blocks  at  each 
location.  Block  sizes  were  restricted  to  contain  no  more  than  1 05  trees  so  that  each  block  can 
be  arranged  within  0.1  ha.  of  land  under  commonly  used  spacing  (Matheson  1989;  White 
1996).  The  adoption  of  single-tree  plots  was  to  achieve  higher  statistical  efficiency  as 
recommended  by  previous  studies  (Lambeth  and  Gladstone  1983;  Loo-Dinkins  and  Tauer 
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1987;  Loo-Dinkins  et  al.  1990;  White  1996).  Four  locations  and  15  blocks  were  used  to 
permit:  a)  a sufficient  number  of  degrees  of  freedom  to  sample  G x E interaction  variance 
and  b)  a reasonable  sensitivity  to  distinguish  the  real  difference  between  any  pairs  of  families 
or  parents  (Cotterill  and  James  1982). 

Data  Generation 

For  experiments  with  half-diallel  and  circular  mating  designs,  data  were  generated 
from  the  following  mixed  linear  models  by  assuming  infinitesimal  gene  action  of  a 
continuous  trait: 

Yuki  = ^ + Ej  + Bij  + gk  + g,  +Ski  + gCik  + gei,+  scik,  + eyw,  3-1 

where  y^^  is  the  observation  on  an  individual  of  the  kl“’  cross  in  the  j*  block  within  the  i“’ 
location; 

p is  the  overall  mean; 

Ej  is  the  fixed  effect  of  the  i“’  location; 

By  is  the  fixed  effect  of  the  j"’  block  within  the  i*'’  location; 

gk  is  the  random  effect  of  the  k‘*’  female  general  combining  ability  (gca),  gk  ~NID(0, 

'^gca  /9 

g,  is  the  random  effect  of  the  1*  male  gca,  g,  ~N1D(0,  Og^a^); 

Ski  is  the  random  effect  of  specific  combining  ability  (sea)  between  the  k‘*’  female  and 
the  I*  male,  SkrNID(0,  o^ca^); 

gCjk  is  the  random  effect  of  interaction  between  the  k“’  female  gca  and  the  i*’’  location, 
gCik  ~N1D(0,  Oge^); 

gCji  is  the  random  effect  of  interaction  between  the  1*  male  gca  and  the  i*  location  in. 
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gei,~NID(0,Gge^); 

sejki  is  the  random  effect  of  interaction  between  the  kl'*’  sea  and  the  i"*  location, 
seiki~NID(0,  aj); 

Cyki  is  the  residual  effect  of  kl*  cross  (family)  in  block  within  i“’  location, 

ep,~NID(O,0e^). 

Reciprocal  effects  of  parents  were  assumed  nil;  furthermore,  it  was  assumed  that  no 
covariance  existed  between  any  pairs  of  random  variables  in  the  model.  The  effects  of 
location  and  block  were  regarded  as  fixed  effects  in  data  analyses,  however,  for  convenience 
in  data  generation,  the  values  of  location  and  block  effects  were  taken  from  normal 
distributed  populations  with  location  E;  ~ NID  (0,  20)  and  block  By-NID  (0,  10), 
respectively,  across  500  experiments.  Since  male  and  female  parents  are  random  seimples  of 
the  same  population,  which  is  feasible  in  mating  designs  of  forest  trees  (Zobel  and  Talbert 
1984),  variances  of  their  GCA  effects  and  the  variances  of  GCA  x environment  interactions 
can  be  pooled,  thus 


E(yijki)  = P + Ej  + By  , 

3-2 

Var(yyk,)=2  +2  +a,J  +aj  + 

3-3 

For  experiments  with  polymix  mating  design,  the  mixed  linear  model  used  for  data 
generation  was: 

yijk=  + Ei  + Bij  + gk  + geik  + eijk,  3-4 

where  yjjki  is  the  observation  of  the  k"’  family  in  the  block  of  the  i*  location; 

Cjjk  is  the  residual  effect  of  k*  family  in  j*  block  within  i“’  location,  eijk~NID(0,  a^^); 
p,  Ej,  By,  gk  and  gCjkhave  the  same  definition  as  those  in  (Eq.3-1). 
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Again,  it  was  assumed  that  no  covariance  between  any  pair  of  random  variables. 


E(yijk)=  p + Ej  + Bjj , 

3-5 

Var(yijk)=ag,,2  +Og,^  +Oel 

3-6 

True  Population  Genetic  Parameters 

Population  genetic  parameters  controlling  the  phenotypic  expression  of  a continuous 
trait  included  the  narrow  sense  heritability  [i.e.,  +^Jsca^ 

for  full-sib  and  h^=  40^^^  /(cfgca^  +agg^  +aj^)  for  half-sib],  type-B  genetic  correlation  [rg  = 
<^gca^/(^gca^  + f^ge^  )](Yamada  1962;  Burdon  1977)  and  the  ratio  of  dominance  variance  to 
additive  variance  (y  = c^sca^/^^gca^)-  Higher  order  non-additive  genetic  effects  (such  as  epistasis) 
and  their  interaction  with  environments  were  assumed  to  be  absent  from  the  true  genetic 
architectures.  Three  levels  of  (0.1,  0.2  and  0.3),  3 levels  of  re  (0.60,  0.75  and  0.90)  and 
3 levels  of  y (0.25,  0.50  and  1.0)  were  respectively  sampled  to  reflect  their  potential  ranges 
in  forest  tree  species  (Yeh  and  Heaman  1987;  Adams  et  al.  1995;  Dieters  et  al.  1995;  Li  et 
al.  1996).  Then,  true  variance  components  for  each  population  were  consequently  derived 
following  Huber  et  al.  (1994)  by  arbitrarily  (but  without  loss  of  generality)  setting  the  total 
phenotypic  variance  to  10.  Instead  of  considering  a factorial  combinations  of  the  levels  of 
the  three  genetic  parameters  (with  27  different  populations),  we  have  chosen  three  levels  of 
genetic  architecture  as  listed  in  table  1 to  represent  low,  moderate  and  high  additive  genetic 
control  over  a phenotypic  trait  (Table  3-1). 
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Table  3-1.  Genetic  parameters  and  true  variance  components  used  to  simulate  three 
populations  for  full-sib  and  half-sib  progeny  test  data. 


Progeny 

type 

Additive 

genetic 

control 

Genetic 

parameters 

True  variance  components 

Y 

Tb 

2 

^gca 

^sca 

V 

High 

0.30 

0.25 

0.90 

0.7500 

0.1875 

0.0833 

0.0208 

8.1251 

Full-sib 

Moderate 

0.20 

0.50 

0.75 

0.5000 

0.2500 

0.1667 

0.0833 

8.3333 

Low 

0.10 

1.00 

0.60 

0.2500 

0.2500 

0.1667 

0.1667 

8.7500 

High 

0.30 

0.90 

0.7500 

0.0833 

9.1667 

Half-sib 

Moderate 

0.20 

0.75 

0.5000 

0.1667 

9.3333 

Low 

0.10 

0.60 

0.2500 

0.1667 

9.5833 

Data  Imbalance 

Unbalanced  data  were  simulated  by  deleting  data  under  each  level  of  the  genetic 
architecture  and  mating  design.  For  the  15-parent-half-diallel  design,  three  levels  of  data 
imbalance  were  produced,  e.g.,  balanced,  moderately  unbalanced  and  severely  unbalanced 
data.  Moderately  unbalanced  data  were  created  by  first  assuming  a random  loss  of  35  full-sib 
crosses  (1/3  of  the  total  crosses  of  105  in  an  experiment)  within  each  experiment  to  account 
for  the  failure  to  make  crosses  or  for  the  complete  loss  of  some  full-sib  families  during  the 
testing  period.  Then,  an  average  of  40%  mortality  was  assumed  to  simulate  the  situations 
reported  in  forest  genetic  tests  (Dieters  et  al.  1995).  Severely  unbalanced  data  were  created 
by  assuming  70  missing  crosses  and  40%  average  mortality.  For  circular  mating  design,  only 
two  levels  of  unbalanced  data  were  considered,  i.e.,  the  balanced  and  unbalanced  data.  The 
unbalanced  data  were  created  by  assuming  13  missing  crosses  (1/8  of  the  total  crosses  in  a 
experiment)  and  40%  average  mortality.  For  polymix  mating  design,  40%  average  mortality 
was  assumed  for  the  unbalanced  data,  but  no  complete  half-sib  families  were  deleted. 
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In  deleting  full-sib  families  from  the  1 5-parent-haIf-diallel  and  the  52-parent-circular 
mating  designs,  restrictions  were  applied  to  ensure  that  the  structures  of  these  mating  designs 
would  not  be  changed,  i.e.,  no  genetic  disconnectedness  would  be  created  among  progenies 
within  an  experiment.  These  restrictions  implied  that  at  most  13  crosses  out  of  14  could  be 
deleted  for  a parent  in  the  15-parent-half-diallel  design,  while  at  most  3 crosses  out  of  4 
could  be  deleted  for  a parent  in  the  52-parent-circular  mating  design. 

While  keeping  average  mortality  around  40%,  the  numbers  of  missing  observations 
were  allowed  to  fluctuate  among  families.  This  implied  that  at  a given  site,  mortality  for 
some  of  the  families  would  be  much  higher  than  40%  ( as  high  as  90%)  while  for  some  other 
crosses,  the  mortality  could  be  lower  than  40%  (as  low  as  10%).  However,  for  the  whole 
population,  average  mortality  was  clustered  around  40%.  Since  mortality  was  assumed  to 
vary  randomly  among  families  across  sites,  it  thus  did  not  contribute  extra-binomial  genetic 
variation  among  families. 

Analytical  Mixed  Linear  Models 

Five  types  of  mixed  linear  models  were  used  to  analyze  each  of  the  simulated  data 
sets.  They  were:  Model  I,  full  models  which  were  identical  to  those  used  in  data  generation 
(i.e.,  Eq.  3-1  & 3-4);  Model  II,  SCA  x environment  interaction  excluded  from  the  full  model; 
Model  III,  all  G X E interactions  excluded  from  the  full  models;  Model  IV,  SCA  effect  and 
its  interaction  with  environment  excluded  from  the  full  model;  and  Model  V,  only  fixed 
effects  and  additive  genetic  effects  considered.  The  notations  and  assumptions  about  these 
analytical  models  are  the  same  as  those  in  Eq.3-1  and  Eq.3-4. 
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Variance  component  estimates  and  predicted  breeding  values  were  obtained  using 
the  GAREML  computer  program  (Huber  1993),  which  uses  Giesbrecht’s  (1983)  algorithm 
to  obtain  the  REML  estimation  of  variance  components  and  then  uses  these  estimates  to 
calculate  the  best  linear  unbiased  prediction  (Henderson  1973)  of  parental  breeding  values 
for  a univariate  trait.  GAREML  has  repeatedly  demonstrated  its  robustness  to  starting  points 
and  precision  in  variance  component  estimation  in  forest  genetic  data  analysis  (Huber  1993; 
Dieters  et  al.  1995). 

Criteria  for  Model  Evaluation 

Heritabilitv 

Among  the  three  types  of  true  population  genetic  parameters  (i.e.,  h^,  re  and  y),  only 
could  be  estimated  from  all  5 analytical  models.  Incomplete  models  ignoring  dominance 
effects  (Model  II,  IV  and  V)  could  not  estimate  y and  incomplete  models  excluding  G x E 
interaction  (Model  II,  III  and  V)  could  not  estimate  rg.  With  the  assumptions  of  no  reciprocal 
effects,  no  epistasis  and  no  inbreeding,  thus  estimated  one-quarter  of  additive  genetic 

variance  (Falconer  1981).  So,  heritability  was  estimated  for  each  of  the  4 analytical  models 
under  each  of  the  combinations  of  mating  design,  genetic  architecture  and  data  imbalance 

as:  3-7 

p 

where  is  the  total  phenotypic  variance.  For  full-sib  designs,  d^=2d^  +2d^  +d^  +d^  +d^ 
for  model  I;  +2d^  +d^  +d^  for  model  II,  d^=2d^  +d^  +d^  for  model  III; 

p gca  ge  sea  e p gca  sea  e 
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c^-2a^  +2a^  +a^  for  model  IV;  and  d^=2d^  +d^  for  model  V.  For  half-sib  designs, 

p gca  ge  e ’ p gca  e ° ’ 

d^=d^  +6^  +d^  for  model  I and  d^=o^  +a^  for  model  III. 

p gca  ge  e p gca  e 

For  each  analytical  model  under  each  of  the  combinations  of  mating  design,  genetic 
architecture  and  data  imbalance,  there  were  500  estimates  of  from  500  simulated 
experiments.  The  means  of  these  estimates  tended  toward  convergence.  We  took  the 
deviations  of  converged  empirical  means  of  4 analytical  models  from  the  true  population 
as  the  measures  of  the  magnitudes  of  biases. 

Ratio  of  predicted  genetic  gain  to  true  genetic  gain  tRt 

In  tree  breeding,  genetic  gain  is  usually  defined  as  the  difference  between  the  mean 
breeding  values  of  selected  individuals  from  the  mean  of  base  population  (Falconer  1981; 
White  and  Hodge  1989).  Since  the  true  breeding  values  of  all  parents  are  known  with 
simulated  data,  the  true  genetic  gains  can  be  calculated  exactly  after  selection.  The  ratio  of 
predicted  genetic  gain  to  true  genetic  gain  thus  shows  whether  genetic  gain  is  over  or  under 
predicted.  In  this  study,  it  was  assumed  that  20%  of  the  top  ranking  parents  {i.e.,  3,  1 1 and 
21  parents  respectively  for  half-diallel,  circular  and  polymix  mating  designs)  were  selected 
within  each  experiment  (across  4 sites)  based  on  the  predicted  breeding  values.  When  5 
analytical  models  were  applied  to  the  data  of  each  experiment,  there  were  5 sets  of  predicted 
breeding  values  and,  consequently,  5 sets  of  selected  parents  which  resulted  in  5 pairs  of 
predicted  and  true  genetic  gains.  For  each  analytical  model,  the  mean  of  ratios  of  predicted 
genetic  gains  to  true  genetic  gains  over  500  experiments  was  calculated  for  a given  mating 
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design,  genetic  architecture  and  data  imbalance.  The  deviation  of  the  mean  from  1 thus 
reflected  the  accuracy  of  that  model  in  genetic  gain  prediction. 

Reliability  of  prediction 

Reliability  is  commonly  defined  as  the  squared  correlation  between  predicted 
breeding  values  and  the  true  breeding  values  and  has  been  used  as  a measure  of  the  precision 
of  breeding  value  predictions  (e.g.,  Uimari  and  Mantysaari  1993;  Mrode  1996).  True 
estimates  of  reliability  requires  the  true  covariance  between  predicted  and  true  breeding 
values  as  well  as  their  variances.  In  practice,  estimates  of  reliability  are  usually  obtained 
based  on  the  estimated  covariance  from  data  samples  since  the  true  covariance  is  never 
known.  This  implies  that  the  precision  of  estimated  reliability  of  breeding  value  predictions 
can  be  affected  by  the  precision  of  estimated  genetic  covariance  and  variances.  For  simulated 
data  with  known  true  BVs,  this  possibility  can  be  tested  by  comparing  the  true  and  estimated 
reliabilities  for  a given  data  set.  The  true  reliability  was  simply  calculated  by  its  definition 


in  the  form; 


,i_cov  (g,g) 


3-8 


where  g and  g are  respectively  the  true  and  predicted  BVs,  cov  (g,  g ) is  the  covariance 
between  g and  g,  and  and  d^  are  the  estimated  variances  of  g and  g.  The  estimated 


reliability  was  estimated  as: 


d^ 

e 


d^ 

gca 


3-9 


where  d^^  is  the  estimated  prediction  error  variance  for  gca  from  the  GAREML  computer 


program. 
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Goodness  of  Approximation  of  Biases  for  Unbalanced  Data 

The  goodness  of  approximations  was  evaluated  by  comparing  the  realized  biases 
obtained  from  the  actual  analyses  of  unbalanced  data  with  those  obtained  from  the  theoretical 
formulae  derived  in  chapter  2 by  using  the  average  design  parameters  of  unbalanced  data. 
Specifically,  the  simple  regression  coefficients  of  approximated  biases  on  the  empirically 
realized  biases  over  repeat  random  samples  of  unbalanced  data  for  each  of  the  combinations 
mating  design  and  genetic  architecture  was  used  to  reflect  the  goodness  of  the 
approximations. 

Results 

Empirical  Biases 
Heritabilitv 

Full  analytical  mixed  linear  models  (Model  I),  which  were  identical  to  those  used  in 
data  generation,  yielded  empirically  unbiased  estimates  of  heritabilities  and  were  globally 
superior  to  any  of  the  incomplete  linear  models  (Table  3-2).  Across  all  mating  designs,  levels 
of  genetic  architecture  and  degree  of  data  imbalance,  none  of  the  mean  heritability  estimates 
from  the  full  analytical  linear  models  deviated  significantly  from  the  true  population 
parameters.  Whereas,  for  the  same  sets  of  data,  all  incomplete  linear  models  but  Model  II 
resulted  in  significant  over  estimation  of  heritability . For  the  two  full-sib  mating  designs,  the 
most  incomplete  linear  model,  ignoring  both  dominance  effects  and  G x E interaction  {i.e., 
Model  V),  produced  a bias  approximately  equaling  the  sum  of  biases  from  Model  III  and  IV, 
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in  which  either  G x E interaction  or  dominance  effect  was  ignored.  Model  II,  which  ignored 
SCA  X environment  interaction  only,  yielded  slightly  downward  biases  of  heritability 
estimates  but  greatly  facilitated  the  procedure  of  data  analysis. 

The  relative  magnitudes  of  biases  in  heritability  estimates  were  largest  when  the 
population’s  true  heritability  and  type  B genetic  correlation  were  low  and  the  ratio  of 
dominance  to  additive  genetic  variances  was  high.  In  the  genetic  architecture  level  1 {i.e., 
h^=0. 1 , rB=0.6  and  y=l  .0),  26%-57%  relative  biases  were  produced  by  Model  V,  which  were 
consistently  much  higher  than  those  in  the  genetic  architecture  level  2 (10%-27%)  and  level 
3 (4%-12%). 

The  biases  in  heritability  estimates  with  incomplete  linear  models  were  increased  by 
the  degree  of  data  imbalance.  When  a few  missing  crosses  (up  to  1 0 out  of  1 05  crosses)  or 
merely  40%  mortality  was  assumed  from  the  balanced  1 5-parent-half-diallel  mating  design, 
only  slight  increment  of  biases  were  detected  for  incomplete  models  (data  not  shown). 
However,  when  more  crosses  and  observations  were  jointly  deleted,  the  biases  in  heritability 
estimates  produced  by  Model  IV  and  V increased  dramatically  with  the  decrease  in  the 
average  number  of  crosses  per  parent.  In  contrast,  the  biases  produced  by  Model  III  (omitting 
G X E interactions)  were  only  slightly  affected  by  data  imbalance,  which  might  be  the  result 
of  holding  the  field  experimental  design  (i.e.,  the  number  of  sites  and  blocks  within  site) 
constant  across  mating  designs  and  data  imbalance  (Figure  3-1). 

Biases  of  heritability  estimates  from  incomplete  models  for  the  circular  mating  design 
(balanced  and  unbalanced  data)  were  comparable  to  those  from  the  severely  unbalanced  data 


a.  Estimates  of  heritability 
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b.  Ratio  of  predicted  to  realized  genetic  gains 


Figure  3-1.  Empirical  estimates  of  heritability  (a)  and  the  ratio  of  predicted  to  realized  genetic 
gains  (b)  from  full  and  incomplete  mixed  linear  models  from  different  levels  of  unbalanced  data. 
Data  structures  are  full-sib  families  created  from  half-diallel  and  circular  mating  designs  and 
tested  in  a randomized  complete  block  mating  design  with  single-tree  plots.  True  population 
genetic  parameters  are:  h^  =0.1,  rB=0.6  and  dominance  to  additive  genetic  variance  component 
ratio  (y)=1.0. 
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of  the  half-diallel  mating  design.  The  magnitudes  of  biases  from  the  incomplete  model  in 
polymix  mating  design  were  comparable  to  the  biases  produced  by  Model  III  in  the  half- 
diallel  and  circular  mating  designs  (Table  3-2). 

Ratios  of  predicted  to  true  genetic  gains  (R1 

Similar  to  heritability  estimates,  the  ratios  of  predicted  genetic  gains  to  true  genetic 
gains  were  uniformly  best  for  the  full  analytical  mixed  linear  models  (Table  3-3).  For  the  full 
analytical  models,  R values  were  consistently  close  to  1 for  any  of  the  mating  designs, 
genetic  architectures  and  levels  of  data  imbalance.  On  the  other  hand,  for  all  incomplete 
models  but  Model  II  (i.e..  Model  III,  IV,  and  V),  R values  were  significantly  larger  than  1, 
which  indicated  the  over-prediction  of  genetic  gains  when  using  predicted  parental  breeding 
values  from  these  models.  Again,  mixed  models  in  which  only  additive  genetic  effects  were 
considered  have  consistently  yielded  the  largest  over-prediction  (up  to  60%),  depending  on 
the  nature  of  genetic  architecture  and  the  degree  of  data  imbalance  of  the  data.  Predicted 
genetic  gain  from  Model  II  was,  however,  slightly  lower  but  not  much  than  from  the  full 
model. 

The  effect  of  genetic  architecture  on  R showed  the  same  trend  as  that  on  heritability. 
When  true  population  genetic  parameters  of  and  rg  were  low  but  y was  high,  larger  biases 
in  genetic  gain  prediction  were  produced.  With  strong  additive  genetic  control,  (/z^=0.3, 
rB=0.90  and  y=0.25),  only  about  5%-14%  over-prediction  of  genetic  gain  would  occur  for 
any  of  the  incomplete  analytical  models.  This  contrasted  sharply  with  the  biases  obtained  in 
genetic  architecture  level  1 (weak  additive  genetic  control),  in  which  10-32%  over-prediction 
of  genetic  gain  would  be  expected  for  balanced  data  in  the  1 5-parent-half-diallel  mating 
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design  and  up  to  60%  over-prediction  for  balanced  and  unbalanced  data  in  the  52-parent-circular 
mating  design. 

Data  imbalance  also  greatly  affected  the  biases  in  genetic  gain  prediction  (Table  3-3). 
Compared  to  the  balanced  data  in  the  15-parent-half-diallel  mating  design,  biases  from 
Model  IV  and  V steadily  increased  \vith  deletion  of  more  crosses.  But,  this  was  not  the  case 
for  Model  III,  for  which  the  biases  stayed  at  roughly  the  same  level  across  all  degrees  of  data 
imbalance  and  mating  designs  even  though  the  average  number  of  crosses  and  progenies  per 
parent  changed  considerably.  Balanced  data  of  the  52-parent-circular  mating  design  had 
similar  average  number  of  crosses  per  parent  as  that  in  the  most  severely  unbalanced  case  of 
1 5-parent-half-diallel  mating  design,  and  they  yielded  similar  magnitudes  of  biases  from  the 
incomplete  models. 

Despite  the  over  or  slightly  under  prediction  of  genetic  gains,  incomplete  models 
achieved  almost  the  same  amount  of  true  genetic  gain  as  that  achieved  by  full  analytical 
model  for  a given  data  set  with  the  same  selection  method  (data  not  shown).  Although  true 
genetic  gains  from  selection  were  significantly  affected  by  the  combination  of  genetic 
architecture,  mating  design  and  data  imbalance,  little  difference  in  true  genetic  gain  was 
produced  by  the  five  different  analytical  models  for  any  of  the  data  sets.  Whereas,  predicted 
genetic  gains  depended  on  the  exact  values  of  the  predicted  breeding  values  and  were  greatly 
influenced  by  the  incomplete  linear  models.  The  true  genetic  gain  achieved  by  selecting  the 
top  parents  depended  only  on  the  rankings  from  the  various  linear  models.  There  were 
similar  parental  rankings  across  all  models  as  evidenced  by  their  correlations.  Pearson 


Table  3-4.  Mean  of  estimated  reliabilities  (f^)  and  true  reliabilities  (f^)  from  four  mixed  linear  models  for  three  mating  designs  and  varying  data 
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polymix  mating  designs  respectively. 

Standard  errors  of  the  means  varies  between  0.000  and  0.002. 
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correlation  coefficients  among  the  predicted  breeding  values  from  the  5 analytical  models 
for  any  data  set  were  always  close  to  1 . 

Reliability  of  prediction 

Estimates  of  reliability  generally  followed  the  same  pattern  as  those  of  heritability 
estimates  and  predicted  genetic  gains.  Reliability  estimates  from  full  analytical  models  were 
closest  to  the  values  of  true  reliability,  especially  for  unbalanced  data  sets  in  the  genetic 
architecture  level  I,  II,  and  III.  In  contrast,  most  reliability  estimates  from  incomplete  models 
were  significantly  biased  upward  (Table  3-4),  which  indicated  that  the  predicted  BVs  were 
stated  to  be  more  reliable  than  they  truly  are.  The  differences  between  true  and  estimated 
reliability  were  largest  for  Model  V with  severely  unbalanced  data  in  the  genetic  architecture 
level  1 and  these  biases  became  substantially  smaller  for  balanced  data  in  the  genetic 
architecture  level  3.  Values  of  true  reliability  were  only  significantly  affected  by  the  true 
population  genetic  parameters  and  the  degrees  of  data  imbalance,  but  differed  little  by  the 
full  and  incomplete  analytical  models  for  a given  data  set.  Estimates  of  reliability,  on  the 
other  hand,  were  affected  greatly  not  only  by  true  population  genetic  architectures  and  the 
degrees  of  data  imbalance,  but  also  by  the  analytical  models. 

Goodness  of  Approximation  of  Biases  for  Unbalanced  Data 

Results  from  analyses  of  simulated  data  using  REML  approach  for  variance 
component  estimation  and  then  BLUP  for  breeding  value  prediction  have  confirmed  that,  for 
balanced  data  created  from  the  polymix  and  half-diallel  mating  designs,  the  magnitudes  of 
empirical  biases  were  identical  to  those  calculated  from  the  theoretical  formulae  (Table  3-5, 
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3-6,  and  3-7).  For  unbalanced  data,  the  empirical  biases  were  generally  greater  than  that 
calculated  from  the  formulae  using  the  average  design  parameters  of  experiments.  This  trend 
was  especially  strong  for  the  incomplete  linear  models  ignoring  SCA  effect  and  its 
interaction  with  environment,  which  suggested  that  the  real  biases  from  incomplete  linear 
model  for  unbalanced  data  were  generally  inadequately  appreciated  using  the  theoretical 
approximations  with  formulae  derived  based  on  balanced  data  (Table  3-5  and  3-6).  However, 
moderate  to  high  Pearson  correlation  coefficients  existed  between  the  empirical  and 
theoretically  approximated  biases  (Table  3-7).  Simple  regressions  of  approximated  biases 
on  realized  empirical  biases  were  significant  (p<0.0001)  for  all  data  structures  considered  in 
this  study.  The  approximation  methods  worked  much  better  for  the  incomplete  model 
ignoring  the  effects  of  G x E interactions  than  for  the  incomplete  model  ignoring  SCA 
effects.  For  instance,  for  incomplete  models  ignoring  G x E interactions,  the  theoretically 
approximated  biases  have  accounted  for  more  than  80%  of  the  variation  of  the  empirical 
biases  over  repeated  random  samples  of  a given  genetic  architecture,  mating  design  and  data 
imbalance.  In  contrast,  for  incomplete  models  ignoring  SCA  effects,  the  theoretically 
approximated  biases  only  accounted  for  41-78%  of  the  variation  in  the  empirical  biases, 
depending  on  the  properties  of  unbalanced  data. 

For  the  seemingly  balanced  full-sib  data  created  from  circular  mating  designs,  the 
approximated  biases  only  matched  the  empirical  biases  perfectly  for  incomplete  models 
ignoring  all  G x E interactions.  For  all  other  incomplete  models  ignoring  SCA  or/and  its 
interaction  with  environment,  the  theoretical  biases  did  not  perfectly  match  the  empirical 
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mortality. 

500  random  experiments  were  simulated. 


Table  3-6  . Mean  empirical  (b^^)  and  theoretically  approximated  {b^)  biases  of  estimates  of  heritability  from  incomplete  linear  models  expressed 
as  the  percentage  to  the  estimates  of  heritability  from  the  full  models  (i.e.,6  =A/i^//i^*100%) 
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Table  3-7  . Regression  slopes  (b)  and  r-squares  (R^)  of  approximated  biases  on  the  empirical  biases 
for  the  estimates  of  additive  genetic  variances  from  different  incomplete  linear  models. 

Level  of  15-parent-half-diallel  52-parent-circular 

additive  Model  

genetic  control^  Unbalanced  1 Unbalanced  2 Balanced  Unbalanced 


b 

R' 

b 

R2 

b 

R' 

1 (low) 

11 

0.726 

0.795 

0.816 

0.855 

0.903 

0.949 

1.061 

0.955 

h'=0.10 

111 

0.897 

0.885 

0.845 

0.797 

1.000 

1.000 

0.922 

0.907 

ra=  0.60 

IV 

0.508 

0.549 

0.484 

0.661 

0.610 

0.770 

0.612 

0.815 

o 

q 

II 

V 

0.699 

0.712 

0.532 

0.654 

0.645 

0.782 

0.629 

0.804 

2 (moderate) 

11 

0.651 

0.746 

0.743 

0.858 

0.940 

0.965 

0.914 

0.908 

h'=0.20 

III 

0.864 

0.880 

0.836 

0.821 

1.000 

1.000 

0.837 

0.841 

Fb=0.75 

IV 

0.485 

0.536 

0.402 

0.411 

0.611 

0.757 

0.618 

0.785 

7=0.50 

V 

0.676 

0.723 

0.469 

0.594 

0.648 

0.771 

0.638 

0.787 

3 (strong) 

II 

0.780 

0.829 

0.801 

0.825 

1.010 

0.975 

0.971 

0.932 

h'=0.30 

III 

0.838 

0.847 

0.823 

0.803 

1.000 

1.000 

0.900 

0.897 

ra=0.90 

IV 

0.467 

0.540 

0.485 

0.697 

0.648 

0.748 

0.555 

0.752 

7=0.25 

V 

0.588 

0.653 

0.478 

0.669 

0.673 

0.774 

0.576 

0.762 

' 500  randomly  simulated  data  samples  were  used  in  each  of  the  regression  analysis. 


biases  although  they  are  highly  correlated  (Table  3-7).  This  indicated  that  data  created  from 


the  circular  mating  designs  are  not  completely  balanced  even  if  there  were  no  missing  values 


according  to  the  mating  design. 


Discussion 

Bias 

Simulations  in  this  study  with  unbalanced  data  of  different  mating  designs,  genetic 
architecture  have  revealed  considerable  biases  in  heritability  estimates  and  genetic  gain 
prediction  from  the  use  of  incomplete  analytical  mixed  linear  models  when  dominance  effect 
and  G X E interaction  were  present  in  the  experimental  data.  Results  from  this  study  are 
consistent  with  those  from  practical  data  analyses.  For  instance,  in  forestry.  Dieters  et  al. 
(1995)  found  that  the  average  of  heritability  estimates  from  single  test  (ignoring  G x E 
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interaction)  were  significantly  higher  than  those  from  multiple-test  analyses  for  a large 
number  of  full-sib  slash  pine  (Pinus  elliottii  Engelm  var.  elliottii)  families  even  when 
dominance  effects  were  considered  in  the  analytical  models.  In  poultry  breeding,  Wei  et  al. 
(1993)  found  that  an  additive  model  ignoring  dominance  effect  significantly  increased  the 
estimates  of  heritabilities  as  compared  with  an  animal  model  that  included  dominance 
effects.  Uimari  and  Mantysaari  (1993)  indicated  that  estimated  reliabilities  from  models  not 
considering  dominance  effects  were  substantially  higher  than  the  empirical  reliability 
estimates  based  on  the  correlation  between  pedigree  index  and  final  sire  proof  in  Finnish 
dairy  cow  evaluation.  Quinton  and  Smith  (1997)  further  indicated  that  heritability  estimated 
from  an  additive  model  was  too  high  to  predict  breeding  values  and  genetic  change  when  an 
empirical  check  was  conducted  with  a large  body  of  Canadian  pig  performance  records  to 
check  on  the  predicted  benefits  of  BLUP  in  genetic  evaluation. 

The  biases  in  heritability  estimates  and  predicted  genetic  gains  are  the  consequences 
of  the  over-estimation  of  additive  genetic  variances  from  the  incomplete  analytical  models. 
For  a given  level  of  the  genetic  architectures  and  data  imbalance  in  this  study,  it  was 
observed  that  the  mean  of  estimated  additive  genetic  variances  from  500  independent 
experiments  always  converged  to  a larger  value  than  the  true  additive  genetic  variance  for 
incomplete  models  but  not  for  the  full  analytical  models.  The  average  of  total  phenotypic 
variances,  however,  converged  close  to  the  designed  true  value  10  for  all  models.  Based  on 
the  theoretically  well-established  relationships  among  additive  genetic  variance,  heritability 
and  genetic  gain  (Falconer  1981),  it  is  thus  not  surprising  to  observe  such  biases  in 
heritability  estimates  and  predicted  genetic  gains  from  incomplete  models. 
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Biases  in  heritability  estimates  and  predicted  genetic  gains  by  Model  V (no  G x E and 
SCA)  were  approximately  the  sum  of  the  biases  produced  by  Model  III  (no  G x E)  and  IV 
(no  SCA).  This  trend  was  observed  in  the  two  full-sib  mating  designs  across  all  levels  of 
genetic  architecture  and  data  imbalance.  This  implies  that  biases  from  incomplete  models 
behave  in  a cumulative  manner,  that  is,  the  more  effects  that  are  ignored  from  an  analytical 
model,  the  larger  the  biases  in  heritability  estimates  and  predicted  genetic  gains  are.  This 
result  is  also  consistent  with  the  theoretical  considerations  based  on  balanced  data. 

The  relatively  stable  biases  from  Model  III  across  mating  designs  and  data  imbalance 
is  attributable  to  the  constant  field  experimental  design  (i.e.,  4 locations,  15  blocks/location) 
in  this  study.  This  is  completely  in  agreement  with  the  theoretical  formula  of  bias  for  this 
incomplete  model,  which  is  mainly  a function  of  the  number  of  locations  in  genetic  tests. 
Thus,  there  should  be  no  evidence  to  suggest  that  the  magnitude  of  biases  from  Model  III 
will  be  limited  to  the  scale  involved  in  this  study.  Therefore,  the  biases  from  Model  III  may 
change  dramatically  given  different  field  experimental  designs,  such  as  the  numbers  of 
locations. 

Approximation 

The  theoretical  formulae  of  biases  derived  based  on  balanced  data  in  chapter  2 did 
not  estimate  the  magnitudes  of  biases  exactly  as  they  were  for  unbalanced  data.  Actual  biases 
for  unbalanced  data  were  mostly  larger  than  those  estimated  from  the  formulae  for 
incomplete  mixed  model  III,  IV,  and  V.  This  trend  was  especially  strong  for  severely 
unbalanced  data  with  weak  additive  genetic  control  over  a trait  (Table  3-5  and  3-6).  The 
larger  actual  biases  from  analysis  of  unbalanced  data  indicated  that  those  approximation 
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formulae  of  biases  derived  based  on  balanced  data  were  inadequate.  This  discrepancy  may 
be  caused  by  the  worsened  orthogonality  among  experimental  factors  in  the  unbalanced 
experimental  data,  which  resulted  in  the  different  ways  of  pooling  sum  of  squares  in  the 
incomplete  models  and  different  expected  mean  squares  in  data  analysis. 

Despite  the  inadequacy  of  the  approximation  formulae  for  severely  unbalanced  data, 
there  were  high  correlations  between  the  approximated  biases  for  each  of  the  incomplete 
mixed  linear  models  and  their  actual  biases.  This,  on  the  other  hand,  suggests  that  the 
approximated  biases  using  the  average  design  parameters  of  an  experiments  can  be  viewed 
as  the  minimum  bias  that  may  be  expected  from  using  incomplete  linear  models  in  forest 
genetic  data  analysis. 

Implications  for  tree-breeding  programs 

Many  economically  important  traits  of  forest  trees  have  low  heritabilities  and 
appreciable  dominance  effects  and  G x E interaction.  For  instance,  in  several  conifer  species, 
narrow-sense  heritability  for  volume  growth  has  been  reported  to  range  from  0. 1 to  0.2  (Yeh 
and  Heaman  1987;  Dieters  et  al.  1995;  Li  et  al.  1996).  In  loblolly  pine  {Firms  taeda  L.), 
dominance  variance  has  been  estimated  to  account  for  50%  - 70%  of  the  total  genetic 
variance  in  height,  50%-55%  in  DBH  and  45%  - 60%  in  volume  (Li  et  al.  1996).  In  slash 
pine,  the  ratio  of  dominance  variance  to  additive  variance  was  estimated  to  range  from  0.32 
to  0.60  with  multi-site  data  and  0.54  - 0.68  with  single  site  data,  and  type  B genetic 
correlations  among  sites  were  estimated  to  range  from  0.61  to  0.76  except  for  a few  cases  in 
which  estimates  were  as  high  as  0.88  (Dieters  et  al.  1995).  In  longleaf  pine,  type-B  genetic 
correlations  were  also  estimated  to  be  in  the  range  of  0.6 1 to  0.74  (Adams  et  al.  1 994).  These 
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well-estimated  genetie  parameters  from  large  data  samples  have  indicated  that  genetic 
architectures  in  most  conifer  tree  breeding  programs  would  fall  into  the  categories  of  genetic 
architecture  level  1 and  2 as  simulated  in  this  study.  Furthermore,  a 1 5-parent-half-diallel 
design  is  impractical.  More  commonly,  the  number  of  crosses  per  parent  would  be  close  to 
the  circular  mating  designs,  in  which  4-6  crosses  per  parent  can  be  made.  Combining  the 
potential  genetic  architectures  and  data  structure  in  forest  genetic  tests  suggests  that  estimates 
of  heritability,  reliability  and  predictions  of  genetic  gains  from  incomplete  models  would  be 
subject  to  severe  biases,  especially  for  those  models  that  ignore  both  dominance  effects  and 
G X E interaction. 

Because  true  genetic  gain  from  selection  would  not  suffer  from  using  incomplete 
analytical  model  as  indicated  in  this  study,  incomplete  analytical  model  considering  only 
additive  genetic  effects  could  be  used  to  save  computational  costs  in  cases  where  an  accurate 
prediction  of  genetic  gain  is  not  critical.  However,  whenever  accurate  estimation  of 
heritability  and  prediction  of  genetic  gain  are  required,  incomplete  models  should  be 
avoided,  especially  incomplete  models  that  ignore  both  dominance  genetic  effects  and  G x 
E interaction. 


Conclusion 

While  full  analytical  mixed  models  consistently  yield  unbiased  estimates  of 
heritability  and  accurate  predictions  of  genetic  gain,  incomplete  mixed  models  generally 
yield  serious  biases  when  complex  genetic  structures  are  present  in  the  data.  The  magnitudes 
of  biases  from  incomplete  models  are  considerably  larger  when  the  population  true 
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heritability  is  low,  G x E is  large  and  only  a few  crosses  per  parent  are  available.  Reliability 
estimates  from  incomplete  models  generally  give  false  indications  of  tbe  accuracy  of  tbe 
breeding  value  prediction  unless  a trait  is  bigbly  additively  genetically  controlled.  Incomplete 
models,  however,  can  be  used  to  rank  parents  for  selection  as  accurately  as  full-models.  No 
evidence  here  suggests  that  true  genetic  gains  suffer  from  using  incomplete  analytical 
models. 

Approximation  formulae  of  biases  based  on  balanced  data  generally  under-estimate 
tbe  biases  for  incomplete  mixed  linear  models  when  data  are  severely  unbalanced.  However, 
approximated  biases  using  tbe  average  designed  parameters  of  an  unbalanced  experiment  can 
be  safely  used  as  an  indication  of  tbe  minimum  bias  caused  by  an  incomplete  mixed  linear 
model,  wbicb  can  be  used  to  evaluate  tbe  suitability  of  an  incomplete  linear  model  for  a 
specific  data  structure. 


CHAPTER  4 

ESTIMATING  TYPE  B GENETIC  CORRELATIONS 

WITH  UNBALANCED  DATA  AND  HETEROGENEOUS  VARIANCES 

Introduction 

Defined  as  the  genetic  correlation  of  the  same  trait  measured  in  different 
environments  (Dickerson  1962;  Yamada  1962;  Burdon  1977),  type  B genetic  correlations 
have  many  applications  in  tree  improvement  programs.  In  numerous  genetic  studies, 
estimates  of  type  B genetic  correlations  have  been  used  as  quantitative  measures  of 
genotype-by-environment  (G  x E)  interactions  (Burdon  1977;  Johnson  and  Burdon  1990; 
Woolaston  et  al.  1991;  Adams  et  al.  1994;  Cooper  and  Delacy  1994;  Dieters  et  al.  1995; 
Dieters  1996;  Pswarayi  et  al.  1997),  which  are  important  considerations  in  formulating 
breeding  strategies  and  in  deploying  genetically  improved  materials  (Zobel  and  Talbert  1 984; 
White  et  al.  1 993).  In  other  tree  improvement  applications,  estimates  of  type  B genetic 
correlations  frequently  serve  as  the  link  for  predicting  genetic  responses  from  indirect 
selection  (Jiang  1985;  White  and  Hodge  1989;  Surles  1993;  Wu  1993;  Johnson  1997). 

Statistical  methods  for  estimating  type  B genetic  correlation  have  been  well 
established  for  balanced  data.  In  forest  genetic  data  analyses,  type  B genetic  correlations 
have  been  routinely  estimated  using  either  the  method  of  Yamada  (1962)  or  the  formula  of 
Burdon  ( 1 977).  So  long  as  data  are  balanced,  both  methods  are  theoretically  well  defined  and 
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yield  identical  and  unbiased  results.  However,  when  severely  unbalanced  data  are  involved 
along  with  heterogeneous  (genetic  or/and  environmental)  variances,  theoretical  concerns 
arise  about  the  general  utilities  of  these  approaches  (Fernando  et  al.  1984)  due  to  their 
potential  biases. 

The  objectives  of  this  study  are  twofold:  (i)  to  develop  a new  approach  for  estimating 
type  B genetic  correlations  that  more  properly  accounts  for  unbalanced  data  with 
heterogeneous  variances  as  well  as  different  experimental  designs  across  environments,  and 
(ii)  to  compare  numerically  the  estimates  of  type  B genetic  correlations  from  different 
methods  using  simulated  data  sets  which  have  known  population  parameters. 


Theoretical  Considerations  of  Type  B Genetic  Correlations 


Background 

The  concept  of  genetic  correlation  for  the  same  trait  measured  in  different 
environment  was  first  used  by  Falconer  (1952)  and  the  theory  was  further  developed  by 
others  (Robertson  1959;  Dickerson  1962;  Yamada  1962).  To  distinguish  this  type  of  genetic 
correlation  from  the  genetic  correlation  between  different  traits  measured  on  the  same 
individuals,  Burdon  (1977)  called  the  former  type  B genetic  correlation. 

From  its  definition,  Yamada  (1962)  derived  an  estimation  method  of  type  B genetic 
correlation  based  on  a two-way  analysis  of  variance  (ANOVA).  For  a pair  of  environments, 
the  type  B genetic  correlation  is  estimated  as: 


g 


g I 9 


4-1 


62 


where  is  the  estimate  of  type  B genetic  correlation,  is  the  genetic  variance  component 

estimated  from  a two-way  analysis  of  variance  involving  data  from  two  environments 
assuming  homogeneous  variances  between  them,  is  the  estimate  of  variance  component 

for  the  effect  of  G x E interaction,  are  the  estimates  of  genetic  variance 

components  within  environment  1 and  2,  respectively. 

Fernando  et  al.  (1984)  showed  that  for  balanced  data,  Yamada’s  method  is  well 
defined.  However,  for  unbalanced  data  Eq.4-1  yields  biased  estimates  of  type  B genetic 
correlation  under  customary  linear  models  (i.e.,  with  the  assumption  of  zero-covariance 
between  random  effects)  unless  genetic  and  environmental  variances  are  identical  across 
environments.  While  trying  to  theoretically  justify  Eq.l  with  an  alternative  model,  Itoh  and 
Y amada  ( 1 990)  acknowledged  that,  in  the  reparameterization  of  Eq.4- 1 , giving  equal  weights 
to  each  of  the  two  environments  is  unreasonable  if  population  sizes  are  finite  and  unequal. 
These  questions  and  the  possible  violation  of  the  assumption  of  homogeneous  variances  in 
a two-way  analysis  of  variance  have  caused  theoretical  concerns  about  the  biases  that  may 
result  from  the  use  of  Yamada’s  method  when  data  are  severely  unbalanced  and  variances 
are  heterogeneous.  However,  the  severity  of  such  biases  has  not  been  well  demonstrated 
based  on  empirical  evidence  of  genetic  testing  data. 

Burdon  (1977)  provided  an  alternative  formula  for  the  estimation  of  type  B genetic 
correlation  as: 

f 


4-2 
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where  f is  the  phenotypic  correlation  between  genetic  group  means  in  environments  x and 

xy 

A A 

y,  /i_and/z-  are  square-roots  of  heritabilities  of  the  genetic  group  means  in  environments  x 
and  y,  respectively. 

Burdon’s  formula  is  based  on  the  assumption  of  uncorrelated  non-genetic  effects 
between  two  environments.  Provided  that  no  common  environmental  effects  exist  between 
two  testing  environments,  this  assumption  would  be  theoretically  appropriate.  It  was  shown 
(Burdon  1977)  that  Eq.4-2  yields  identical  estimates  of  type  B genetic  correlations  to  that 
from  Eq.4-1  when  data  are  balanced.  Since  Burdon’s  method  does  not  require  a two-way 
analysis  of  variance,  it  tactically  avoids  the  violation  of  the  homogeneous  variance 
assumption  in  cases  where  variances  (either  genetic  or  environmental)  are  heterogeneous 
across  environments.  In  addition,  because  only  genetic  group  means  are  used  in  the 
calculation  of  phenotypic  correlation,  and  heritabilities  of  genetic  group  means  are  estimated 
separately  in  each  environment,  Eq.4-2  facilitates  the  calculation  of  type  B genetic 
correlations  in  cases  where  experimental  designs  differ  across  genetic  tests. 

Despite  of  these  obvious  advantages,  estimates  of  type  B genetic  correlations  from 
Eq.4-2  may  also  become  biased  when  missing  values  occur  due  to  mortality  or  other 
unforeseeable  factors  in  the  experiments  which  cause  unequal  number  of  observations  per 
genetic  group.  In  such  cases,  genetic  group  means  may  be  confounded  by  other  experimental 
factors,  resulting  in  biased  estimates  of  genetic  covariance.  Again,  the  severity  of  the 
potential  bias  has  not  been  numerically  evaluated. 
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A New  Approach  Using  BLUP-predicted  Parental  GCA  Effects  (GCA  Approach) 

Theory  of  the  new  approach 

To  simplify  the  derivation  without  losing  generality,  let  iCj , , —,x.  be  the 

means  of  half-sib  progenies  from  m independent  female  parents  measured  in  environment 
X from  a randomized  complete  block  (RCB)  experimental  design  with  one  individual  per 
family  in  each  plot  (single-tree  plots).  Similarly,  let  y.  be  the  means  of  half-sib  progenies 

of  the  same  m parents  measured  in  environment  y with  the  same  experimental  design.  Let  n^. 

andn^.  be,  respectively,  the  number  of  offspring  on  which  x.  and  y.  are  based. 

For  a one-way  analysis  of  variance  in  each  single  environment,  the  analytical  linear 
models  are:  x.=u  +B  +2  +e  . 4-3a 

IJ  X * XJ  ^Xl  XIJ 

and  T u+g  4-3b 

ik  y ^yk  ^yi  yik 


respectively  for  environment  x and  y; 
where  and  fXy  are  the  overall  means; 

and  Pyi,  are  the  fixed  effects  of  block,  j=l,  2,  and  k=l,  2, 

g^i  and  gyi  are  the  random  effects  of  the  i*  family,  ~ NID  (0,  ^),g,,~NID(O,0^), 

i=  1,  2,  ...,  m; 

and  e^^ij  and  are  the  j*'’  (or  k“’ ) residual  effect  within  the  i*  family,  ~ NID  (0,  a^^), 
e,„~NID(0,a^); 

Thus,  Var(x.)=var(  n +B +e  )~  0 » and  similarly,  Var(y.)=a  +— . 

'■  °xi  xi.  ^ n '■  « 

xi  yi 
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Note  when  differs  among  families,  Var(x. ) is  heterogeneous  among  x.  ’s,  and  this 
is  also  true  among  Var(y.  )’s  if  «^.it«^=constant.  Therefore,  the  phenotypic  correlation 

Covix.y.) 

between  genetic  group  means  from  two  environments,  i.e.,  f = '■  ''  is 

sjVar{x.)Var(y.) 

ambiguously  defined  due  to  the  different  distributional  properties  among  the  x.'s  in 

environment  x or/and  among  the  y.  ’s  in  environment  y. 

A reasonable  approach  to  alleviate  these  heterogeneous  variances  among  the  x.  ’s  and 

w ’s  is  to  conduct  a data  standardization.  Let 
1. 


(y  “h  “P  ) (y 
and  V’  - ^ 

1 

1 

far{x.) 

2 ex 

^Var(yJ 

rp-  + ^ 

gx  n . 

Xl 

gy  n . 
y 

So  that  Var(x:)=^Var(y:)=\. 


By  the  central  limit  theorem,  when  n^.  and«^.  are  sufficiently  large,  then 


).  x;~N(0,l); 

1.  X X.  gx  I. 

xi 

_ _ _ 
and  >/,-~N(0,  1). 

yi 

Cov(x:y:)  

Thus,  r=  ' ' =Cov(x  ’,v.’')  is  uniformly  defined. 

^Varix’)Var(y:) 

By  conducting  such  data  standardization,  two  major  objectives  are  achieved,  i.e., 
fixed  effects  of  blocks  are  removed  and  variances  are  homogenized  among  families  within 


an  environment. 
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4-5 


This  results  because  residual  effects  across  environments  are  assumed  to  be 
uncorrelated  as  in  Eq.4-2,  and  when  data  are  balanced,  > so  that  Cov(x.^.) 

estimates  one-quarter  of  the  additive  genetic  covariance.  For  unbalanced  data,  w^.  and  w^. 

can  be  viewed  as  the  weights  used  in  the  calculation  of  genetic  covariance  to  account  for  the 
unequal  number  of  observations  among  genetic  groups.  In  the  above  calculation,  the  means 
of  genetic  groups  based  on  fewer  observations  will  have  less  leverage  and,  therefore. 
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contribute  less  to  the  calculation  of  genetic  covariance  than  the  means  of  genetic  groups 
based  on  more  observations.  The  significance  of  Eq.4-5  is  when  data  are  balanced  it  yields 
identical  results  to  those  from  Eq.4- 1 and  Eq.4-2,  but  when  data  are  unbalanced  it  yields 
properly  weighted  phenotypic  correlation  between  family  means  in  two  environments  after 
removing  fixed  effects. 

From  Eq.4-5,  it  can  be  further  developed  that 


+ 


+—  / (6  6 ) • 
sy  n gx  gy' 
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h-h- 

X y 


4-6 


where  Fg  is  the  estimate  of  type  B genetic  correlation,  h-andh-  are  square-root  heritabilities 
of  the  genetic  group  means  in  environments  x and  y,  respectively. 

Equation  4-6  appears  identical  to  the  format  of  Burdon  in  Eq.4-2  except  that  f is 
estimated  differently  from  f in  Eq.4-2.  In  the  current  approach,  f is  estimated  based  on 

residuals  after  removing  all  fixed  effects  and  has  been  adjusted  for  the  unequal  number  of 
observations  of  genetic  group  means  during  the  procedure  of  data  standardization. 
Operational  calculations 

Equations  4-5  and  4-6  are  theoretically  informative,  but  computationally  tedious, 
especially  when  complex  experimental  designs  are  involved.  We  show  below  that  the 
required  calculation  is  best  accomplished  by  using  the  predicted  parental  GCA  effects  with 
the  technique  of  best  linear  unbiased  prediction  (BLUP). 

Let  —,g^^  be  the  predicted  parental  GCA  effects  of  the  m parents  in 

environment  x and  g^^,  g^^,  —,gy^  be  the  predicted  GCA  effects  of  the  same  m parents  in 
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environment  y.  From  the  theory  of  BLUP  (Henderson  1 984),  it  is  known  that  g^=b-fx. 


g^-b-iy  -X  ) , where  b-.  and6_.  are  regression  coefficients,  and  x . and  t are 

yi  yi  i yt  xi  yi  ^ x/  yi 

generalized  least  square  estimates  of  the  fixed  effects  corresponding  to  x.  and  y., 

respectively.  With  the  above  RGB  experimental  designs  and  considering  block  being  fixed 
effects,  it  can  be  derived  (Searle  et  al.  1992,  p.68)  that: 
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Therefore,  with  the  assumption  of  BLUP  (Henderson  1 984;  White  and  Hodge  1 989) 
that  variance  components  are  known  without  error,  we  have 


Var(-^)=^Var(b  x)  =b  Var(x) 

2 < 
o + 

g*  n , 


(by  substituting  6-.  of  Eq.4-  7) 


g* 


and  similarly,  Var(-^)=a^  . 


Thus 


Cov(^,^)=  Cov(^*x.,^*y) 
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Thus  r*=f  (comparing  Eq.4-5  with  Eq.4-1 1). 
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In  the  theory  of  breeding  value  prediction,  Mrode  (1996)  demonstrates  that,  in 
general,  b=kr^,  where  b is  regression  coefficient,  A:  is  an  integer  (for  parental  GCA  effect, 

k =1 ) and  r^is  the  ‘accuracy’  of  prediction  with  the  technique  of  BLP  (best  linear  prediction). 


g . g . g 

In  Appendix  4-1,  it  is  shown  that:  E[Cov(—,—)/r^r^=o^,  E[Var(—)]=a^^,  and 


r r 

axi  ayi 


g . 

E[Var{—)]=c\  Thus,  the  type  B genetic  correlation  is  estimated  as: 
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where  r^is  the  estimate  of  type  B genetic  correlation,  r*is  Pearson  correlation  coefficient 

between  weighted  parental  GCA  effects  predicted  in  environments  x and  y using  BLUP, 
andr^r^  is  mean  of  products  of  prediction  accuracies  in  the  two  environments. 

It  should  be  pointed  out  that  in  the  derivation  of  Eq.4-12,  f^.was  assumed  equal  to 


the  prediction  ‘accuracy’  using  the  technique  of  BLP.  In  practice,  however,  the  technique  of 

BLUP  needs  to  be  applied  to  remove  fixed  effects  and  prediction  ‘accuracy’  is  generally 

given  smaller  in  BLUP  than  in  BLP  due  to  the  fact  that  BLUP  estimates  fixed  effects  while 
BLP  does  not  (White  and  Hodge  1989;  Searle  et  al.  1992).  The  appropriate  BLP  f thus  has 


to  be  either  precisely  calculated  using  computer  programs  or  converted  approximately  from 


the  output  of  BLUP  programs.  To  convert  BLUP  f to  BLP  f .,  an  adjustment  factor 


m 


\ m-l 


needs  to  be  multiplied  to  the  BLUP  f^.,  where  m is  the  number  of  parents  tested  in  the 

environment.  Based  on  our  numerical  studies,  the  adjustment  factor  is  satisfactory  for  a 
variety  of  balanced  and  moderately  unbalanced  data  sets  created  from  half-sib  or  fiill-sib 
mating  designs.  For  severely  unbalanced  data,  such  as  a circular  mating  design  with  more 
than  50  parents  being  used,  it  is  best  to  calculate  the  BLP  f^.  exactly  using  appropriate 

computer  programs. 

In  summary,  Eq.4-12  estimates  type  B genetic  correlations  using  predicted  parental 
GCA  effects  from  two  concerned  environments  and  involves  the  following  steps: 

1 . Predict  parental  GCA  effects  in  each  environment  using  the  technique  of  BLUP 
(best  linear  unbiased  prediction)  with  available  computer  BLUP  software  such  as  GAREML 
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(Huber  1993),  MTDFREML  (Boldman  et  al.  1995),  ASREML  (Gilmour  et  al.  1997),  etc.. 

2.  Divide  each  predicted  parental  GCA  effect  by  its  ‘accuracy’  of  prediction  in  each 

g 

environment  to  create  weighted  parental  GCA  effects,  i.e.,^,  where  g.  is  the  predicted 

r > 


parental  GCA  effect  of  the  i*  parent  andf  . is  its  prediction  ‘accuracy’  from  BLP.  If  r is  the 


prediction  ‘accuracy’  from  BLUP,r  needs  to  be  multiplied  by  an  adjustment  factor^ 


m 


\ m-l’ 


where  m is  the  number  of  parents  tested  in  the  environment. 

3.  Calculate  the  Pearson  correlation  coefficient  of  the  adjusted  parental  GCA  effects 
between  the  two  concerned  environments. 

4.  Divide  the  correlation  coefficient  from  step  (iii)  by  the  mean  of  the  products  of  the 
prediction  accuracies  for  parents  in  the  two  environments. 

The  advantage  of  this  approach  lies  in  the  fact  that  by  using  predicted  parental  GCA 
effects  in  calculating  type  B genetic  correlation  several  goals  are  achieved  simultaneously: 
(1)  fixed  effects  are  properly  removed  with  the  technique  of  best  linear  unbiased  estimation 
(BLUE),  so  that  they  no  longer  confound  genetic  effects  when  data  are  unbalanced;  (2) 
heterogeneous  variances  associated  with  genetic  group  means  due  to  their  unequal  number 
of  observations  and  different  precision  are  automatically  adjusted  for  in  the  process  of 
breeding  value  prediction;  (3)  this  method  can  be  extended  to  calculate  additive  type  B 
genetic  correlations  when  data  of  full-sib  progenies  are  used;  and  (4)  this  method  is 
applicable  when  different  experimental  designs  are  used  in  the  two  environments,  such  as 


different  plot  sizes. 
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Numerical  Comparisons  of  Estimation  Methods 

Methods 
Data  generation 

Simulated  data  were  used  in  the  numerical  comparison  because  the  true  underlining 
type  B genetic  correlation  was  known  for  each  data  set.  Data  were  generated  based  on  a 
randomized  complete  block  design  with  single-tree  plots,  which  is  recommended  by  several 
studies  in  forest  genetic  testing  (Lambeth  and  Gladstone  1 983;  Loo-Dinkins  and  Tauer  1 987; 
Loo-Dinkins  et  al.  1990;  White  1996).  Genetic  structures  of  the  data  were  based  on  half-sib 
families  created  from  a polymix  mating  design  with  100  female  parents.  The  linear  model 
in  matrix  notation  is: 
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where  is  the  n,  x 1 vector  of  phenotypic  observations  in  environment  i,  i=l,2',  rij  is  the 
number  of  observations  in  environment  /;  //,  is  the  overall  mean  in  environment  i and  1,  is 
an  rijX  1 vector  of  1 s;  Xj  is  the  incidence  matrix  relating  the  block  effects  (vector  Pj ) in 
environment  i;  Z;  is  the  incidence  matrix  relating  to  the  female  genetic  effects  (vector  gi)  in 
environment  i;  Z„j  is  the  incidence  matrix  relating  to  the  genetic  effects  of  Mendelian 
sampling  (vector  g^)  in  environment  i;  e,  is  the  «,  jc  1 vector  of  residuals  in  environment  /. 
Covariance  between  different  random  effects  in  the  model  within  an  environment  was 


assumed  nil. 
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The  phenotypic  value  of  each  individual  was  determined  as  the  summation  of  all  the 
pertinent  genetic  and  environmental  effects  in  the  models.  The  levels  of  each  effect  were 
assumed  to  be  a random  sample  from  a large  population.  Independent  standard  normal 
deviates  (p=0,  =1)  were  created  using  SAS  Rarmor  frmction  (SAS®  Institute  Inc.  1990) 

to  reflect  random  variation  within  each  effect.  The  magnitude  of  variation  within  each  effect 
was  determined  by  its  designed  variance.  For  correlated  effects  between  environment  x and 


y,  the  method  of  Van  Vleck  (1993)  was  used,  in  which  E =xa  , and  E =x—+y 

XX  y O 

X 


N 


y a 

X 


where  and  Ey  are  the  correlated  effects  in  environment  x and  y,  x and  y are  independent 


vectors  of  standard  normal  deviates,  and  o^,  and  are  the  designed  variance  and 


covariance  for  E^  and  Ey  respectively.  Since  estimates  of  type  B genetic  correlations  are 
invariant  with  respect  to  data  standardization  and  the  phenotypic  variance  in  each 
environment  can  always  be  standardized  to  one,  heterogeneous  variances  between  two 
environments  can  thus  be  reflected  by  their  difference  in  heritability.  In  this  study,  several 
combinations  of  heritabilities  between  two  environments  were  considered  to  represent  the 
situations  of  heterogeneous  variances  (Table  4-1  to  4-3). 

Block  effects  can  be  major  sources  of  variation  that  contribute  to  differences  among 
genetic  group  means  when  data  are  unbalanced.  This  is  because  block  effects  are  confounded 
with  genetic  effects  due  to  non-orthogonality.  The  assumption  of  fixed  block  effects  can  help 
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remove  these  confounding  effects.  Therefore,  block  effects  were  intentionally  treated  as  fixed 
effects  in  this  study.  However,  for  convenience  in  data  generation,  blocks  were  created  as  if 
they  were  random  samples  from  a large  normally  distributed  population  (pp=0,  =2),  and 

that  the  observed  variation  among  blocks  was  twice  as  large  as  the  phenotypic  variation 
within  a block  (i.e.,  Op^  =2). 

After  balanced  data  were  generated,  30%  mortality  was  simulated  to  all  data  samples 
by  randomly  deleting  observations.  For  each  set  of  designed  genetic  parameters  between  two 
environments  (h]^,  \x2  and  rs),  500  independent  simulation  runs  were  performed. 

Methods  of  comparison 

Type  B genetic  correlations  were  estimated  for  each  data  sample  from  four  different 
estimating  formulas:  Yamada  method  I (Eq.4-1),  Burdon  method  (Eq.4-2);  the  GCA 
approach  proposed  in  this  study;  and  an  alternative  Yamada’s  formula  II,  which  has  often 

been  used  in  the  form:  r= — ^ , 4-16 

a +o, 
g I 


where  the  definition  of  each  element  is  the  same  as  in  Eq.4-1 . 

Two  main  criteria  were  used  for  evaluating  the  4 estimation  methods.  First,  empirical 
bias  was  calculated  as  the  difference  between  means  of  estimated  and  true  type  B genetic 
correlations  over  500  random  samples  for  a given  population,  i.e.,  Bias-r~-r^,  where. 


- 1^  . , . _ 1^ 
with  being  the  estimated  type  B genetic  correlation  of  the  i*  sample; 


with  rtBi  being  the  true  type  B genetic  correlation  of  the  i*  sample.  N is  the  total  number  of 


random  samples. 
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Analysis  of  variance  (ANOVA)  was  conducted  for  each  level  of  the  genetic 
parameters  to  deteet  the  statistical  significance  between  the  estimated  and  true  type  B genetic 
correlations.  A LSD  multiple  comparison  were  consequently  performed  after  ANOVA 
revealing  significant  differences.  The  biases  showing  significant  difference  from  zero  were 
marked. 

The  second  criteria  used  in  the  evaluation  was  the  mean-distance  (MD)  between  the 

estimated  and  true  type  B genetic  correlations  which  was  calculated  as:jWD=— Elr  -r  I . 

N I A 'A 

Like  for  bias,  analysis  of  variance  and  multiple  comparison  (LSD)  were  performed  to  detect 
statistical  differences  among  the  four  estimation  methods  under  each  level  of  genetic 
parameter.  For  each  level  of  genetic  parameters,  the  estimation  precision  of  Yamada  II  was 
used  as  comparison  standard  and  estimation  precisions  of  Yamada  I,  Burdon,  and  the  GCA 
approach  were  identified  if  they  were  significantly  higher  or  lower  than  that  of  Yamada  II. 

Due  to  sampling  errors,  the  estimates  of  variance  components  may  result  in  the 
estimatesof  type  B genetic  correlation  beyond  parameter  space  (i.e.,  f <0  or  r >1.0  ).  In 

B.  B. 

these  cases,  estimated  correlations  were  modified  to  the  theoretical  boundary  value  (0  or  1 .0) 
following  Singh  et  al.  (1997). 

Results  of  Numerical  Comparisons 
Bias 

When  data  were  balanced,  all  methods  except  for  Yamada  II  yielded  identical 
estimates  even  though  genetic  and  environmental  variances  were  heterogeneous  (data  not 
shown).  When  data  were  balanced  within  an  environment  but  were  unbalanced  between 
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environments  in  terms  of  the  number  of  replieations  (blocks),  Burdon’s  method  (Eq.4-2)  and 
the  GCA  approach  (Eq.4-12)  yielded  identical  estimates  of  type  B genetic  correlations, 
which  generally  had  smaller  empirical  biases  than  the  Yamada  methods  (Table  4-1).  The 
larger  magnitude  of  biases  from  Yamada’s  formulae  (I  and  II)  were  related  to  the  differences 
in  heritabilities  and  the  number  of  replications  between  two  environments.  The  larger  the 
difference  between  two  environments  in  these  factors,  the  greater  the  biases  from  Yamada’s 
formulae. 

When  data  were  unbalanced  within  an  environment  due  to  missing  values  caused  by 
mortality  (30%),  but  approximately  equally  replicated  across  environments  in  terms  of  the 
number  of  blocks,  Burdon’s  formula  (Eq.4-2)  generally  yielded  the  largest  downward  biases. 
In  contrast,  for  such  data  structure,  Yamada’s  methods  yield  almost  unbiased  estimates. 
When  data  were  unbalanced  both  within  and  between  environments,  the  GCA  approach 
consistently  yielded  the  smallest  biases  which  were,  for  most  cases,  not  significantly 
different  from  zero  (Table  4-1). 

Estimation  precision 

The  precision  of  estimates  of  type  B genetic  correlation  is  represented  by  the  mean- 
distance  (MD)  between  estimated  and  true  type  B genetic  correlations.  The  MDs  were 
affected  by  both  the  heritabilities  in  two  environments  and  the  estimating  methods  when  data 
were  unbalanced  (Table  4-2).  For  all  methods,  the  higher  the  heritability  in  two 
environments,  the  smaller  the  MD.  When  the  lower  heritability  between  two  environments 
was  accompanied  by  a larger  number  of  replications,  Burdon’s  methods  (Eq.4-2)  and  the 
GCA  approach  tended  to  yield  smaller  MD.  Whereas,  when  the  lower  heritability  between 
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Table  4-1.  Empirical  biases  of  type  B genetic  correlations  from  four  estimating  methods  under 
different  genetic  architectures  in  two  environments  as  represented  by  their  narrow  sense  heritabilities 
(h,^  and  and  true  type  B genetic  correlations.  Experimental  design  is  randomized  complete  block 

design  with  single-tree  plots  in  both  environments. 


Env.  1 

Env.  1 

True 

Env.  1 

Env.  1 

Mortality 

Empirical  Biases  of  rg  estimates 

h,^ 

W 

bl 

b2 

% 

Yamada 

Yamada 

Burdon 

GCA 

I 

II 

Approach 

0.2 

0.5 

0.75 

40 

20 

0 

0.064* 

-0.031* 

0.015 

0.015 

40 

20 

30 

0.068* 

-0.025 

-0.134* 

0.007 

20 

40 

0 

-0.005 

-0.082* 

0.014* 

0.014 

20 

40 

30 

-0.059* 

-0.140* 

-0.166* 

-0.014 

40 

10 

0 

0.142* 

0.050* 

0.021 

0.021 

10 

40 

0 

-0.068* 

-0.123* 

0.005 

0.005 

0.2 

0.4 

0.75 

40 

40 

30 

0.000 

-0.057* 

-0.135* 

0.014 

40 

20 

0 

0.052* 

-0.006 

0.014 

0.014 

40 

20 

30 

0.053* 

-0.002 

-0.153* 

0.009 

20 

40 

0 

-0.010 

-0.063* 

0.020 

0.020 

20 

40 

30 

-0.066* 

-0.117* 

-0.180* 

0.020 

40 

10 

0 

0.104* 

0.057* 

0.024 

0.024 

10 

40 

0 

-0.068* 

-0.123* 

0.005 

0.005 

0.2 

0.3 

0.75 

20 

20 

30 

0.029 

-0.002 

-0.180* 

0.023 

20 

10 

0 

0.052* 

0.022 

0.013 

0.013 

20 

10 

30 

-0.014 

-0.050* 

-0.212* 

0.001 

10 

20 

0 

-0.015 

-0.052* 

0.005 

0.005 

10 

20 

30 

-0.022 

-0.064* 

-0.212* 

-0.031* 

0.2 

0.1 

0.75 

40 

4 

30 

0.000 

-0.053* 

-0.161* 

-0.005 

40 

20 

0 

0.074* 

0.026 

0.035* 

0.035* 

20 

10 

30 

-0.013 

-0.003 

-0.183* 

-0.012 

10 

20 

0 

-0.069* 

-0.089* 

-0.014 

-0.014 

20 

40 

30 

-0.043* 

-0.032* 

-0.165* 

-0.019 

0.1 

0.4 

0.60 

20 

20 

30 

0.010 

-0.115* 

-0.122* 

0.008 

20 

10 

0 

0.153* 

-0.009 

0.020 

0.020 

20 

10 

30 

0.111* 

0.017 

-0.156* 

0.021 

10 

20 

0 

0.007 

-0.116* 

0.030* 

0.030* 

10 

20 

30 

0.055* 

-0.181* 

-0.193* 

0.073* 

Note:  Biases  are  calculated  based  on  500  simulated  random  samples  for  each  combination  of  genetic 
architecture  and  data  imbalance;  bl  and  b2  are  the  numbers  of  blocks  in  environment  1 and  2.  *Bias  is 
significant  different  from  zero  at  the  probability  level  of  a i0.05. 
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Table  4-2.  Mean-distance  (MD)  between  estimated  and  true  type  B genetic  correlations  from  four 
estimation  methods  under  different  genetic  architectures  in  two  environments  (as  represented  by  their 
narrow  sense  heritabilities  (h,^  and  and  true  type  B genetic  correlations).  The  experimental 
design  is  randomized  complete  block  with  single-tree  plots  in  both  environments . 


Env.  1 

Env.  2 

True 

Env.  1 

Env.  2 

Mortality 

Mean-distance 

% 

h," 

Tb 

bi 

bj 

Yamada 

Yamada 

Burdon 

GCA 

I 

II 

Approach 

0.2 

0.5 

0.75 

40 

20 

0 

0.108*^ 

0.095 

0.086*- 

0.086*- 

40 

20 

30 

0.137 

0.123 

0.177*^ 

0.117 

20 

40 

0 

0.101 

0.110 

0.111 

0.111 

20 

40 

30 

0.143*- 

0.165 

0.213*^ 

0.144*- 

40 

10 

0 

0.173*^ 

0.127 

0.113*- 

0.113*- 

10 

40 

0 

0.157 

0.158 

0.174 

0.174 

0.2 

0.4 

0.75 

40 

40 

30 

0.175 

0.167 

0.248*-" 

0.175 

40 

20 

0 

0.107 

0.096 

0.093 

0.093 

40 

20 

30 

0.138 

0.127 

0.198*" 

0.125 

20 

40 

0 

0.106 

0.112 

0.113 

0.113 

20 

40 

30 

0.151 

0.161 

0.235*" 

0.148 

40 

10 

0 

0.157 

0.136 

0.128*- 

0.128*- 

10 

40 

0 

0.164 

0.154 

0.173*" 

0.173*" 

0.2 

0.3 

0.75 

20 

20 

30 

0.132 

0.128 

0.223*" 

0.131 

20 

10 

0 

0.141 

0.136 

0.135 

0.135 

20 

10 

30 

0.140 

0.138 

0.253*" 

0.179*" 

10 

20 

0 

0.137 

0.135 

0.135 

0.135 

10 

20 

30 

0.183 

0.181 

0.292*" 

0.199 

0.2 

0.1 

0.75 

40 

40 

30 

0.175 

0.167 

0.248*" 

0.175 

40 

20 

0 

0.189 

0.199 

0.169*- 

0.169*- 

40 

20 

30 

0.211*- 

0.206 

0.294*" 

0.213 

20 

40 

0 

0.201 

0.174 

0.206*" 

0.206*" 

20 

40 

30 

0.256 

0.202 

0.320*" 

0.267*" 

0.1 

0.4 

0.60 

20 

20 

30 

0.195 

0.180 

0.237*" 

0.198 

20 

10 

0 

0.168 

0.164 

0.166 

0.166 

20 

10 

30 

0.291*^ 

0.214 

0.253*" 

0.222 

10 

20 

0 

0.184 

0.194 

0.221 

0.221 

10 

20 

30 

0.231 

0.229 

0.324*" 

0.274 

Note:  Mean-distance  is  calculated  based  on  500  simulated  random  samples  for  each  combination  of  genetic  architecture 
and  data  imbalance,  b,  and  bj  are  the  number  of  blocks  in  individual  environment. ' bl  and  b2  are  the  number  of  blocks 
in  individual  environment.  ’’  and  indicate  that  MD  calculated  from  Yamada  I,  Burdon  or  the  GCA-approach  is 
significantly  smaller  (better)  or  larger  (worse)  than  MD  calculated  from  Yamada  II  at  the  probability  level  asO.05. 
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two  environments  was  accompanied  by  fewer  number  of  replications,  Yamada’s  method  I 
& II  (Eq.4-1  & Eq.4-16)  resulted  in  smaller  MD.  Consistently,  Burdon’s  method  yielded 
considerably  larger  MD  than  the  other  methods  when  30%  mortality  was  present  in  the 
experimental  data.  For  all  cases,  the  GCA  approach  had  MDs  less  than  or  equal  to  Burdon’s 
method,  depending  on  data  imbalance  (Table  4-2). 

When  the  estimation  precision  of  Yamada  II  was  used  as  a standard  for  comparison, 
the  precision  of  Yamada  I or  the  GCA-approach  differed  significantly  only  for  a few  levels 
of  genetic  parameters,  in  which  they  were  either  significantly  higher  (MD-)  or  lower  (MD+) 
than  that  of  Yamada  II  (Table  4-2).  For  the  majority  of  levels  of  genetic  parameters, 
however,  these  three  estimation  methods  showed  no  significant  differences.  F or  the  Burdon’ s 
method,  the  estimation  precision  was  almost  always  significantly  lower  than  that  of  Yamada 
II  when  30%  of  mortality  was  present  in  the  data. 

Out-of-bound  estimates 

Estimates  of  type  B genetic  correlation  from  the  simulated  data  sets  often  fell  outside 
the  theoretical  parameter  space  of  0 to  1 (Burdon  1977).  All  methods  except  for  Yamada  II 
yielded  such  results,  especially  when  the  true  population  heritabilities  were  low  (data  not 
shown).  When  true  population  heritabilities  were  0. 1 to  0.2  in  the  simulated  data,  up  to  25% 
of  estimates  of  type  B genetic  correlations  were  either  greater  than  1 or  less  than  zero  even 
though  data  were  balanced.  It  was  observed  that  the  out-of-bound  estimates  of  type  B genetic 
correlation  were  mainly  caused  by  near-to-zero  estimates  of  genetic  variance(s)  from  one  or 
both  of  the  testing  environments.  Being  in  the  denominator  of  the  estimating  equations,  this 
produced  irregular  values  of  estimates  of  type  B genetic  correlation.  Figure  4- 1 indicated  the 


Estimates  of  type  B genetic  correlations  Estimates  of  type  B genetic  correlations 


a.  Between  environment  1 (h2=0.4)  and  2 (h2=0.1) 
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b.  Between  environment  1 (h^=0.3)  and  2 (h^=0.1) 
(true  population  rB=0.6) 


Estimates  of  family  variance  components 


Figure  4 -1 . Effects  of  genetic  variance  component  estimates  on  the  number  of  out-of- 
bound  estimates  of  type  B genetic  correlation 
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effects  of  near  to  zero  estimates  of  family  variance  components  on  the  number  of  out-of- 
bound  estimates  of  type  B genetic  correlations.  The  number  of  out-of-bound  estimates 
became  steadily  smaller  when  the  true  population  heritabilities  in  two  environments  were 
higher  (>0.2).  There  was,  however,  no  significant  difference  among  Yamada  I,  Burdon  and 
the  GCA  approach  in  producing  out-of-bound  estimates  for  a given  population  genetic 
structure. 

Correlation  between  true  and  estimated  type  B genetic  correlations 

Pearson  correlation  coefficients  between  the  true  and  estimated  genetic  correlations 
are  given  in  Table  4-3.  Each  Pearson  correlation  coefficient  was  calculated  based  on  500 
random  data  samples  each  having  a pair  of  true  and  estimated  type  B genetic  correlations 
from  a given  estimation  method  under  a given  level  of  genetic  parameters.  A higher 
correlation  between  the  true  and  estimated  type  B genetic  correlation  indicates  a better 
correspondence  between  them  and,  therefore,  a higher  quality  for  the  estimation  method.  In 
this  simulation  study,  the  ANOVA  based  methods  (i.e.,  Yamada  I and  II)  showed  detectable 
differences  from  the  correlation-based  methods  (i.e.,  the  Burdon  and  GCA  approaches).  For 
data  with  fewer  replications  in  environments  of  low  heritability,  the  former  seemed  to  be 
better,  whereas  for  data  with  fewer  replications  in  environments  with  higher  heritability,  the 
latter  seemed  to  be  better.  In  general,  the  GCA  approach  showed  consistent  improvement 
over  Burdon’s  approach,  and  Yamada  II  performed  better  than  Yamada  I. 
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Table  4-3.  Correlation  between  estimated  and  true  type  B genetic  correlations  from  four  estimating 
methods  under  different  genetic  architectures  in  two  environments  as  represented  by  their  narrow 
sense  heritabilities  (h,^  and  and  true  type  B genetic  correlations.  Experimental  design  is 
randomized  complete  block  design  with  single-tree  plots  in  both  environments. 


Env.  1 

U 2 

Env.  1 

1.  7 

True 

Env.  1 

Env.  2 

Mortality 

% 

Correlation  Coefficients 

hi 

^2 

Tb 

hi 

b2 

Yamada  Yamada 

Burden 

BV 

I 

II 

Approach 

0.2 

0.5 

0.75 

40 

20 

0 

0.602 

0.600 

0.642 

0.642 

40 

20 

30 

0.482 

0.501 

0.431 

0.533 

20 

40 

0 

0.581 

0.590 

0.546 

0.546 

20 

40 

30 

0.420 

0.454 

0.370 

0.394 

40 

10 

0 

0.502 

0.547 

0.578 

0.578 

10 

40 

0 

0.489 

0.438 

0.387 

0.387 

0.2 

0.4 

0.75 

40 

40 

30 

0.522 

0.510 

0.411 

0.520 

40 

20 

0 

0.498 

0.470 

0.518 

0.518 

40 

20 

30 

0.397 

0.413 

0.352 

0.434 

20 

40 

0 

0.458 

0.459 

0.433 

0.433 

20 

40 

30 

0.304 

0.334 

0.216 

0.292 

40 

10 

0 

0.432 

0.465 

0.479 

0.479 

10 

40 

0 

0.354 

0.395 

0.319 

0.319 

0.2 

0.3 

0.75 

20 

20 

30 

0.356 

0.378 

0.237 

0.359 

20 

10 

0 

0.329 

0.328 

0.328 

0.328 

20 

10 

30 

0.203 

0.247 

0.086 

0.123 

10 

20 

0 

0.315 

0.349 

0.319 

0.319 

10 

20 

30 

0.123 

0.138 

0.076 

0.168 

0.2 

0.1 

0.75 

40 

40 

30 

0.273 

0.312 

0.079 

0.275 

40 

20 

0 

0.310 

0.322 

0.345 

0.345 

40 

20 

30 

0.301 

0.306 

0.192 

0.269 

20 

40 

0 

0.286 

0.333 

0.278 

0.278 

20 

40 

30 

0.220 

0.231 

0.142 

0.212 

0.1 

0.4 

0.60 

20 

20 

30 

0.267 

0.265 

0.144 

0.272 

20 

10 

0 

0.417 

0.438 

0.466 

0.466 

20 

10 

30 

0.237 

0.248 

0.259 

0.280 

10 

20 

0 

0.373 

0.412 

0.271 

0.278 

10 

20 

30 

0.226 

0.258 

0.089 

0.187 

Note:  b,  and  bj  are  the  numbers  of  blocks  in  environment  1 and  2. 
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Discussion 

The  Yamada  and  Burdon  methods  of  type  B genetic  correlation  have  been  routinely 
used  in  forest  genetic  data  analysis,  but  the  properties  of  the  estimates  are  not  well 
understood.  Simulation  studies  with  data  from  populations  of  known  genetic  parameters 
provided  useful  information  about  the  desirable  and  undesirable  properties  of  estimates  from 
these  methods. 

The  new  GCA  approach  using  predicted  parental  GCA  effects  with  the  technique  of 
BLUP  has  shown  considerable  improvement  over  Burdon’ s method  in  producing  smaller 
empirical  bias,  smaller  mean  distance  and  an  improved  relationship  between  the  true  and 
estimated  type  B genetic  correlations  when  data  are  unbalanced  and  have  heterogeneous 
variances.  This  is  consistent  with  theoretical  considerations  as  discussed  in  this  chapter. 

For  the  Yamada  (1 962)  methods  (1  & II),  the  absolute  values  of  empirical  biases  were 
not  numerically  large  although  most  of  them  were  statistically  different  from  zero.  This  was 
especially  true  when  data  samples  were  not  extremely  unbalanced  between  two  environments 
and  the  heterogeneity  of  variances  was  not  very  large.  Only  in  a few  cases  where  data  were 
extremely  unbalanced  between  two  environments  in  terms  of  their  relative  sizes  (i.e.,  the 
numbers  of  blocks)  and  the  heterogeneity  was  large  (h|^/h2^  > 2),  were  some  larger  absolute 
values  of  biases  observed.  Compared  with  the  large  sampling  errors  of  the  estimates  of  type 
B genetic  correlations  with  low  heritabilities  (Table  4-2),  the  magnitudes  of  biases  may  be 
of  little  practical  significance.  In  this  sense,  the  results  may  suggest  that  Yamada’s  methods 
can  be  reasonably  used  in  most  cases  of  forest  genetic  tests  that  involves  moderately 
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unbalanced  data,  even  though  the  Yamada’s  methods  have  some  theoretical  inadequacies 
(Fernando  et  al.  1984). 

The  approach  of  Yamada  II  (Eq.4-16)  was  the  only  univariate  method  that 
consistently  yielded  estimates  of  type  B genetic  correlation  within  the  theoretical  parameter 
boundaries  (0  to  1 ) when  estimates  of  negative  variances  were  accepted  as  zero.  Due  to  the 
fact  that  the  effect  of  sampling  error  of  variance  components  on  the  estimation  of  type  B 
genetic  correlation  cannot  be  effectively  controlled  by  the  other  three  estimating  methods, 
Eq.4-16  (Yamada  II)  was  the  most  empirically  robust  univariate  method  in  estimating  type 
B genetic  correlation.  Although  this  approach  is  not  theoretically  well  defined  for  data  with 
heterogeneous  variances  (Yamada  1 962),  it  did  produce  empirically  equal  or  more  desirable 
properties  of  the  estimates  than  the  Yamada  I (Eq.4-1)  and  Burdon’s.  As  long  as  two 
environments  have  small  differences  in  the  numbers  of  replications  and  heritabilities,  the 
empirical  bias  from  this  approach  is  minimal  and  may  be  negligible  (Table  4-1)  after  data 
standardization  removing  scale  effects.  Considering  its  other  benefits,  such  as  the 
computational  convenience  and  satisfying  the  practical  needs  of  maintaining  estimates  of 
genetic  correlations  within  theoretical  parameter  space  for  the  purpose  of  indirect  selection, 
this  method  would  be  empirically  more  suitable  than  Yamada  I and  Burdon  approaches.  If 
heterogeneous  variances  among  environments  are  extremely  large,  however,  this  method  is 
known  to  yield  severe  biases  (Dutilleul  and  Carriere  1998). 

Burdon’s  formula  is  sensitive  to  mortality  if  block  effects  are  large.  This  result  may 
partially  be  attributable  to  the  relatively  large  block  effects  simulated  in  this  study.  Large 
block  effects  are  common,  however,  in  forest  genetic  tests  given  the  heterogeneous  testing 
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environments  of  forest  lands.  Like  the  GCA  approach,  Burdon’s  method  is  robust  to  unequal 
data  sizes  and  heterogeneous  variances  between  two  environments  as  long  as  mortality  is  low 
within  a progeny  test.  But  when  mortality  is  high  and  block  effects  are  large,  this  method 
generally  produces  estimates  with  large  downward  biases  and  significantly  lower  estimation 
precision,  which  are  inferior  to  other  methods. 

For  convenience,  the  derivation  of  the  GCA  approach  and  the  simulations  were  based 
on  half-sib  families.  The  method,  however,  may  be  extendable  to  estimate  additive  type  B 
genetic  correlations  with  full-sib  families.  For  full-sib  families,  the  predicted  parental  GCA 
effects,  not  the  full-sib  family  means,  are  used  in  Eq.4- 1 2 and  the  estimates  of  type  B genetic 
correlations  should  be  additive.  It  should  be  noted  that  for  full-sib  family,  estimates  of  type 
B genetic  correlation  from  Burdon’s  method  are  not  additive  genetic  correlations  but 
genotypic  correlations  because  additive  genetic  effects  and  dominance  effects  are  not 
separated  in  that  method.  Since  estimates  of  additive  type  B genetic  correlations  are  needed 
to  predict  genetic  gains  from  indirect  selection  (Falconer  1989),  further  studies  may  be 
necessary  to  examine  estimation  methods  for  their  ability  to  estimate  additive  type  B genetic 
correlations  with  full-sib  genetic  test  data. 

Like  the  methods  of  Yamada  I and  Burdon,  the  GCA  approach  only  estimates  type 
B genetic  correlations  between  pairs  of  environments.  To  get  the  averaged  estimate  of  type 
genetic  correlation  among  multiple  environments,  one  may  either  take  the  average  of  paired 
estimates  (Dieters  et  al.  1995)  or  pool  the  data  of  appropriate  environments  and  predict 
parental  GCA  effects  with  more  complex  mixed  linear  models.  In  this  regard,  the  GCA 
approach  is  not  computational  efficient  as  compared  with  Yamada  II.  But  when  the 
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assumption  of  homogeneous  variances  caimot  be  made  or  the  experimental  designs  are 
different  among  experiments,  it  can  be  a useful  alternative. 

Conclusion 

The  GCA  approach  of  type  B genetic  correlation  estimation  has  provided  an  useful 
tool  in  handling  unbalanced  data  with  heterogeneous  variances.  This  method  is  relatively 
easy  to  use  when  BLUP  computer  packages  are  available,  since  the  GCA  approach  calculates 
type  B genetic  correlations  after  properly  removing  fixed  effects  and  adjusting  for 
heterogeneous  variances,  estimates  of  type  B genetic  correlations  generally  have  better  or 
equal  properties  in  terms  of  unbiasedness  and  precision  as  compared  with  other  methods.  For 
experimental  data  composed  of  different  field  experimental  designs,  severe  data  imbalances 
and  large  heterogeneous  variances,  the  GCA  approach  is  a viable  option  for  estimating  type 
B genetic  correlations.  Yamada’s  (1962)  methods  may  be  appropriate  to  estimate  type  B 
genetic  correlation  with  acceptable  or  negligible  biases  for  slightly  or  moderately  unbalanced 
forest  genetic  test  data  which  have  moderately  heterogeneous  variances. 


CHAPTER  5 

COMPARISON  OF  MULTIVARIATE  AND  UNIVARIATE  METHODS 
FOR  ESTIMATING  TYPE  B GENETIC  CORRELATIONS 

Introduction 

In  quantitative  forest  genetic  data  analysis,  type  B genetic  correlations  (Burdon  1 977) 
have  been  routinely  estimated  using  the  univariate  methods  of  Yamada  (1962)  and  Burdon 
(1977)  (Burdon  1977;  Johnson  and  Burdon  1990;  Woolaston  et  al.  1991;  Hodge  and  White 
1992;  Hodge  and  Purnell  1993;  Adams  et  al.  1994;  Dieters  et  al.  1995;  Dieters  1996; 
Pswarayi  et  al.  1997).  For  pairs  of  genetic  tests,  these  methods  first  estimate  genetic 
variances  and  covariances  using  univariate  linear  models  in  one  or/and  two  environments 
separately  and  then  calculate  genetic  correlations  according  to  the  described  procedures 
(Yamada  1962;  Burdon  1977).  Although  these  univariate  methods  provide  considerable 
flexibility  in  minimizing  computational  demands  and  facilitating  suitable  computer 
softerwares  (such  as  SAS®),  arguments  suggest  that  estimates  of  genetic  correlations  from 
these  univariate  methods  are  less  satisfactory  for  some  data  structures  (Fernando  et  al.  1 984; 
Dutilleul  and  Carriere  1998). 

One  of  the  undesirable  aspects  of  these  univariate  methods  is  their  inability  to  yield 
unbiased  estimates  of  type  B genetic  correlations  for  unbalanced  experimental  data 
accompanied  with  heterogeneous  variances  across  environments.  For  example,  theoretical 
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considerations  and  empirical  evidence  have  suggested  that  serious  biases  are  associated  with 
estimates  of  type  B genetic  correlation  from  Yamada’s  (1962)  methods  when  data  are 
unbalanced  in  terms  of  experiment  sizes  and  when  heterogeneous  variances  are  present 
across  environments  (Fernando  et  al.  1984;  Ito  and  Yamada  1990;  Dutilleul  and  Carriere 
1998).  Severe  biases  for  estimates  of  type  B genetic  correlations  were  also  foimd  with 
Burdon’s  method  when  mortalities  cause  data  imbalance  within  a genetic  test  (Chapter  4 of 
this  dissertation).  Although  improvement  to  univariate  methods  can  be  made  (Dutilleul  and 
Carriere  1998;  Chapter  4 of  this  dissertation)  so  that  unbiased  estimates  of  type  B genetic 
correlations  are  still  obtainable  for  unbalanced  data  with  heterogeneous  variances,  some  of 
the  procedures  become  more  computationally  complex  and  inconvenient. 

Another  concern  with  univariate  methods  is  that  type  B genetic  correlation  estimates 
are  often  out  of  the  theoretical  parameter  space  and  results  are,  therefore,  difficult  to  apply 
in  practical  applications.  For  instance,  both  empirical  evidence  (Koots  and  Gibson  1 996)  and 
simulation  study  (chapter  4 of  this  dissertation)  have  indicated  that  the  frequency  of  out-of- 
bound  estimates  increases  with  the  decrease  of  true  population  heritabilities.  Out-of-bound 
estimates  often  occur  when  near-to-zero  estimates  of  genetic  variance  are  obtained  in  one  or 
two  of  the  concerned  environments,  while  the  estimates  of  genetic  covariance  are  relatively 
large.  Potential  reasons  for  out-of-bounds  estimates  are  primarily  attributable  to  sampling 
errors  of  genetic  variances  and  covariances,  but  may  also  be  related  to  the  fact  that  genetic 
variance-covariances  are  not  estimated  from  a closed  system  with  these  univariate  methods. 

A third  potential  question  facing  univariate  methods  of  type  B genetic  correlation 
estimation  is  that  genetic  relatedness  and  inbreeding  among  and  within  genetic  groups  cannot 
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be  properly  accounted  for.  Since  the  possibility  of  genetic  relatedness  among  genetic  groups 
(especially  among  fiill-sib  families)  increases  as  tree  improvement  programs  progress  into 
advanced  generations  (White  et  al.  1993;  Borralho  and  Dutkowski  1998),  the  assumption 
with  univariate  statistical  methods  that  genetic  groups  are  independent  random  samples  of 
a large  population  would  be  violated  if  genetic  relatedness  does  exist.  Failure  to  account  for 
genetic  relatedness  and  inbreeding  may  cause  inaccurate  estimates  of  genetic  variance  and 
covariances  and  potentially  result  in  biased  estimates  of  type  B genetic  correlations. 

Multivariate  methods  can  estimate  genetic  variances  and  covariances  simultaneously 
using  restricted  maximum  likelihood  (REML)  approach  with  an  iterative  procedure 
(Patterson  and  Thompson  1971;  Schaeffer  and  Wilton  1978).  For  multivariate  methods, 
measurements  from  different  environments  are  treated  as  different  traits  with  different 
variance  and  covariance  structures.  Consequently,  the  problem  of  heterogeneous  variances 
facing  univariate  methods  is  properly  solved.  It  is  believed  that  REML  approach  is  generally 
more  desirable  than  ANOVA  (analysis  of  variance)  methods  in  handling  with  unbalanced 
data  for  the  purpose  of  variance  component  estimation  (Searle  et  al.  1 992;  Huber  et  al.  1 994). 
In  addition,  some  multivariate  methods  can  apply  constraints  to  estimates  of  genetic 
variances  and  covariances  so  that  estimates  of  genetic  correlations  stay  within  the  theoretical 
parameter  space  (Boldman  et  al.  1 995).  Furthermore,  multivariate  REML  methods  can  make 
use  of  pedigree  information  so  that  genetic  relatedness  among  genetic  groups  is  properly 
treated  in  the  process  of  variance  component  estimation  (Boldman  et  al.  1995;  Gilmour  et 


all  997). 
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Despite  the  potential  advantageous  properties  of  multivariate  methods  in  estimating 
type  B genetic  correlations,  uncertainty  remains  as  to  whether  constrained  multivariate 
procedures  yield  unbiased  estimates.  It  is  also  unclear  how  many  environments  should  be 
used  in  a constrained  system  to  enhance  the  quality  of  estimates.  The  purpose  of  this  study 
was  to  numerically  compare  some  multivariate  and  commonly  used  univariate  methods  for 
type  B genetic  correlation  estimation  using  simulated  forest  genetic  data.  Specifically,  we 
examined  these  estimation  methods  in  terms  of  unbiasedness  and  precision,  as  well  as  the 
distributional  properties  of  the  estimates  under  different  genetic  parameters. 

Material  and  Methods 

Data  Generation 

Simulated  data  were  used  in  numerical  comparisons  because  for  each  simulated  data 
set  the  true  underlying  type  B genetic  correlation  is  known  and  can  used  to  evaluate  the 
qualities  of  estimates.  Data  were  generated  based  on  a randomized  complete  block  (RGB) 
design  with  one  tree  per  family  per  plot  (i.e.,  single-tree  plots)  which  is  recommended  by 
several  studies  in  forest  genetic  testing  (Lambeth  and  Gladstone  1983;  Loo-Dinkins  and 
Tauer  1987;  Loo-Dinkins  et  al.  1990;  White  1996).  Genetic  structures  of  the  data  were 
simulated  based  on  half-sib  families  created  from  a polymix  mating  design  with  120  female 
parents. 

In  the  field  experimental  designs,  it  was  assumed  that  these  1 20  half-sib  families  were 
tested  over  4 environments,  each  having  90  families  and  20  blocks.  It  was  further  assumed 
that  there  were  60  half-sib  families  in  common  for  any  paired  progeny  tests,  but  there  was 
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no  family  in  common  across  the  4 environments.  The  linear  model  used  in  data  generation 
across  4 testing  environments  is  given  in  matrix  notation  as: 
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where  is  a x 7 vector  of  phenotypic  observations  in  environment  i,  i=l,...4',  ri)  is  the 
number  of  observations  in  environment  z;  /z,  is  the  overall  mean  in  environment  z and  1;  is 
an  ZZ;  X 7 vector  of  1 s;  Xj  is  the  incidence  matrix  relating  to  block  effects  (vector  P; ) in 
environment  z;  Zj  is  the  incidence  matrix  relating  to  female  parental  genetic  effects  (vector 
gi)  in  environment  z;  is  the  incidence  matrix  relating  to  the  genetic  effects  of  Mendelian 
sampling  (vector  g^)  in  environment  z;  e,-  is  the  «,  x 7 vector  of  residuals  in  environment  z. 
Covariance  between  random  effects  (i.e.,  female  parent  and  residual)  in  the  model  was 
assumed  nil,  such  that 

E(yi)=li  + Xj  Pi,  E(gi)=0,  E(g„i)=0,  and  E(ej)=0; 
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The  phenotypic  value  of  each  individual  was  determined  as  the  summation  of  all 
genetic  and  environmental  effects  in  the  model.  The  levels  of  each  effect  were  assumed  to 
be  a random  sample  from  a large  normal  population.  Independent  standard  normal  deviates 
(p=0,  =1)  were  created  using  the  SAS  Rannor  function  (SAS®  Institute  Inc.  1990)  to 

reflect  random  variation  within  each  effect.  The  magnitude  of  variation  within  each  effect 
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was  determined  by  its  designed  variance.  Correlated  female  additive  genetic  effects  among 
the  4 testing  environments  were  created  as: 

A=B'C  5-2 

where  A is  the  matrix  of  additive  genetic  effects  of  female  parents,  B is  a matrix  of  the 
square  root  (Cholesky  decomposition)  of  designed  genetic  variance-covariance  matrix  G,  and 
C is  a column  vector  of  independent  standard  normal  random  deviates,  such  that 

Var  (A)=B'Var  (C)B=B'B=G  5-3 

Heterogeneous  genetic  variances  among  progeny  tests  are  reflected  by  the  genetic 
variances  and  covariances  in  matrix  G.  The  designed  population  genetic  parameters  such  as 
heritability  and  type  B genetic  correlations  were  intentionally  simulated  to  have  a relatively 
large  variation  among  the  4 environments  aimed  to  represent  a wide  range  of  situations  that 
may  exist  among  real  forest  genetic  data  sets  (Table  5-1).  Without  losing  generality, 
phenotypic  variance  within  each  progeny  test  was  set  to  1.0  due  to  the  fact  that  data 
standardization  is  highly  recommended  in  forest  genetic  data  analysis  (Wu  1993;  White 
1996;  Dieters  1996)  to  remove  scale  effects  and  that  data  standardization  can  always  adjust 
phenotypic  variance  to  1 within  a single  environment. 

Block  effects  were  treated  as  fixed  effects  in  this  study  for  three  reasons:  ( 1 ) variance 
component  estimation  for  random  female  genetic  effect  is  not  affected  under  customary 
linear  models  (i.e.,  no  covariance  assumed  between  random  effects)  whether  block  is  treated 
as  random  or  fixed  effect;  (2)  treating  block  as  fixed  effect  facilitates  the  application  of 
multivariate  methods  in  this  study;  and  (3)  the  assumption  of  fixed  block  effects  can  help 
remove  block  effects  from  confounding  genetic  effects  when  data  are  imbalanced.  However, 
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for  convenience  in  data  generation,  levels  of  block  were  created  with  the  assumption  that 
they  were  random  samples  from  a normally  distributed  population  (pb=0,  =2),  and  that 

the  observed  variation  among  blocks  was  twice  as  large  as  the  phenotypic  variation  within 
a block  (i.e.,  =2). 

After  data  were  generated,  30%  mortality  was  randomly  simulated  in  all  progeny 
tests  by  random  deletion.  A total  of  300  independent  simulated  data  sets,  each  containing  4 
environments  and  20  blocks,  were  generated  and  analyzed  in  this  study. 

lable  5-1.  Designed  heritabilities  and  type  B genetic  correlations  for  an  arbitrary 
continuous  trait  among  four  simulated  environments. 


Environment 


1 

2 

3 

4 

Heritability  and  type  B genetic  correlations 

1 

0.40 

0.90 

0.80 

0.70 

2 

0.30 

0.70 

0.60 

3 

0.20 

0.50 

4 

0.10 

Designed  genetic  variance  and  covariance  matrix  (G) 

1 

0.1000 

0.0779 

0.0566 

0.0350 

2 

0.0750 

0.0429 

0.0260 

0.0500  0.0177 

0.0250 


Shaded  areas  are  narrow  sense  heritabilities  of  the  four  environments. 
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Estimation  of  Type  B Genetic  Correlations 
Multivariate  methods 

Multivariate  computer  programs  MTDFREML  (Boldman  et  al.  1 995)  and  ASREML 
(Gilmour  et  al.l997)  were  used  to  analyze  each  of  the  300  data  sets  by  treating  the 
measurements  of  an  arbitrary  continuous  trait  from  different  environments  as  different  traits. 
MTDFREML  is  a computer  software  which  uses  a simplex  method  to  approach  the 
convergence  for  variance  component  estimation  and  allows  for  constraining  estimates  of 
genetic  correlations  within  the  theoretical  parameter  space.  ASREML  is,  on  the  other  hand, 
a computer  program  which  uses  an  average  information  algorithm  (Gilmour  et  al.  1 997)  and 
sparse  matrix  techniques  to  efficiently  solve  large  mixed  models;  it  does  not  constrain 
estimates  of  genetic  correlations  within  theoretical  parameter  space.  Therefore,  results  from 
MTDFREML  and  ASREML  were  used  to  represent,  respectively,  the  constrained  and 
unconstrained  multivariate  estimates  for  type  B genetic  correlations. 

For  both  multivariate  methods,  input  data  structures  were  modified  (Table  5-2)  in 
order  to  estimate  genetic  variances  and  covariances  for  the  same  trait  measured  in  different 
environments  (type  B)  rather  than  for  different  traits  measured  on  the  same  individuals  (type 
A).  Convergence  criteria  were  set  for  MTDFREML  as  MVFV  (Minimum  Variance  of 
Function  Values  in  Simplex)  ^10  ^ and  for  ASREML,  |(-2L  „+,  - (-2L  „)|  ^0.002, 
respectively,  following  the  instructions  of  program  manuals  (Boldman  et  al.  1995;  Gilmour 
et  al.  1997).  True  genetic  variance-covariance  components  were  used  as  priors  for  starting 
the  iterative  processes.  For  both  multivariate  methods,  two  different  grouping  of 
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environments  were  used  to  estimate  variance  components  based  on  the  assumption  that 
estimates  of  genetic  variances  and  covariances  are  system  dependent  so  that  estimates  of  type 
B genetic  correlations  are  not  identical  if  they  are  estimated  from  pair-wise  or  over  all  sites. 


The  small  grouping  contained  only  pairs  of  environments  while  the  larger  grouping 


contained  all  four  environments  and  estimated  all  pair-wise  genetic  correlations 
simultaneously. 

Table  5-2.  Illustration  of  data  structure  used  in  multivariate  analysis  to  estimate  type  B genetic 
correlations.  Experimental  design  is  assumed  as  a randomized  complete  block  design  with  3 
environments,  each  having  3 blocks  with  one  tree  per  family  per  block.  Observations  from  different 
environments  are  treated  as  different  traits. 


Environment 

Block 

Family 

Trait  1 

Trait  2 

Trait  3 

1 

1 

1 

10.51 

1 

1 

2 

9.83 

1 

1 

3 

7.78 

1 

2 

1 

8.39 

1 

2 

2 

7.67 

1 

2 

3 

6.78 

1 

3 

1 

12.34 

1 

3 

2 

11.23 

1 

3 

3 

10.98 

2 

4 

1 

8.65 

2 

4 

2 

8.21 

2 

4 

3 

7.67 

2 

5 

1 

9.69 

2 

5 

2 

8.76 

2 

5 

3 

8.65 

2 

6 

1 

6.67 

2 

6 

2 

7.43 

2 

6 

3 

5.89 

3 

7 

1 

12.34 

3 

7 

2 

13.45 

3 

7 

3 

10.56 

3 

8 

1 

13.45 

3 

8 

2 

12.56 

3 

8 

3 

11.98 

3 

9 

1 

12.17 

3 

9 

2 

13.64 

3 

9 

3 

15.48 

Note  that  blocks  1-3  are  from  environment  1,  blocks  4-6  are  from  environment  2 and  blocks  7-9  are  from 
environment  3.  Dots  stand  for  missing  values,  which  are  necessary  and  intentionally  given  in  this  data 
structure. 
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Univariate  methods 

Univariate  methods  used  in  this  study  included  the  traditional  methods  of  Yamada 
(1962),  Burdon  (1977)  and  a GCA  approach  (Chapter  4 of  this  dissertation).  A previous 
comparative  simulation  study  suggested  that  the  GCA  approach  generally  yields  more 
desirable  properties  of  the  estimates  of  type  B genetic  correlation  when  data  are  severely 
unbalanced  and  heterogeneous  variances  exist  among  environments  (Chapter  4 of  this 
dissertation). 

For  the  Yamada  method,  the  type  B genetic  correlation  is  estimated  as: 


estimated  from  a two-way  analysis  of  variance  involving  data  from  two  environments 
assuming  homogeneous  variance  between  them,  6^  is  the  estimate  of  variance  component 

for  the  effect  of  G x E interaction,  d^^ando^^  are,  respectively,  the  estimates  of  genetic 
variance  components  within  environment  1 and  2.  Often  used  in  forest  genetic  studies  is  a 


r 


g 


5-4 


B 


where  is  the  estimated  type-B  genetic  correlation,  d^  is  the  genetic  variance  component 


simplified  formula  of  Eq.4  which  is: 


5-5 


where  elements  in  Eq.5  are  the  same  as  in  Eq.4.  For  convenience,  we  refer  Eq.4  to  Yamada 
I and  Eq.5  to  Yamada  II. 
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With  Burden’s  method,  the  type  B genetic  correlation  is  estimated  as: 


hJi- 

X y 


5-6 


where  r is  the  phenotypic  correlation  between  genetic  group  (i.e.  half-sib  families)  means 
in  environments  x and  y,  and  ^-and^_  are  square-roots  of  the  heritabilities  of  the  genetic 
group  means  in  environments  x and  y,  respectively. 

For  the  GCA  approach,  parental  GCA  effects  are  first  predicted  using  the  technique 
of  univariate  best  linear  unbiased  prediction  (BLUP)  in  each  environment  and  these  are 
adjusted  to  calculate  type  B genetic  correlation  as: 
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where  f is  the  estimate  of  type  B genetic  correlation,  r * is  a Pearson  correlation  coefficient 

B 


between  adjusted  parental  GCA  predictions  in  environments  x and  y using  BLUP,  i^,and 

are  predicted  parental  GCA  effects  in  environments  x and  y respectively,  andr^r^  is  the 

mean  products  of  adjusted  ‘prediction  accuracy’  (see  Chapter  4 of  this  dissertation)  in  the 
two  environments. 


Criteria  for  Comparisons 

After  the  type  B genetic  correlations  were  estimated  for  each  pair  of  environments 
within  each  of  the  300  simulated  data  sets  using  the  above  univariate  and  multivariate 
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methods,  three  main  criteria  were  used  to  evaluate  the  estimation  methods.  First,  empirical 
bias  was  calculated  as  the  difference  between  means  of  estimated  and  true  type  B genetic 
correlations  over  300  random  samples  for  each  pair  of  environments,  i.e.,  Bias=F~-F  , 
- 1^ 

where  with  being  the  estimated  type  B genetic  correlation  of  the  i“’  sample; 

r g=— with  r,Bj  being  the  true  type  B genetic  correlation  of  the  i*  sample.  N(==300)  is  the 
N j I 

total 

number  of  random  samples.  The  statistical  differences  of  the  empirical  biases  from  zero  were 

tested  by  one-way  analysis  of  variance  (ANOVA). 

The  second  criterion  was  the  mean-distance  (MD)  between  the  estimated  and  true 

type-B  genetic  correlations  which  was  calculated  as:  MD=— -r  I . The  smaller  the  MD, 

N f ‘^1 

the  closer  the  estimates  to  their  true  values,  and  consequently,  the  higher  estimation 
precision.  The  third  criterion  was  the  simple  correlation  between  the  estimates  of  type 

B genetic  correlations  and  the  true  type  B genetic  correlations.  Higher  correlation  reflects  the 

better  response  of  estimated  type  B to  the  changes  of  true  type  B genetic  correlation  and  thus 

shows  better  quality  of  the  related  estimation  methods.  Outliers  were  excluded  if  their 

distances  to  the  true  values  exceeded  three  times  the  MD. 

Results 


Bias 

Among  the  univariate  methods,  which  is  always  applied  to  pairs  of  environments,  the 
GCA  approach  yielded  empirically  unbiased  estimates  of  type  B genetic  correlations  for  all 


99 


balanced  and  unbalanced  data  sets  (Table  5-3).  Burden’s  method  yielded  unbiased  estimates 
when  data  were  balanced  within  an  environment  but  yielded  severely  biased  estimates  when 
data  were  unbalanced  due  to  missing  values.  Yamada  1 (Eq.5-4)  also  produced  nearly 
unbiased  estimates  for  almost  all  data  sets.  Yamada  II  (Eq.5-5)  tended  to  yield  slightly 
downward  biases,  and  biases  for  a few  environment  pairs  became  severe  when  ratios  of 
genetic  variances  in  two  environments  were  greater  than  2. 

Unconstrained  multivariate  method  ASREML  and  constrained  multivariate  method 
MTDFREML  both  yielded  empirically  unbiased  estimates  of  type  B genetic  correlations 
when  either  all  four  environments  or  only  two  environments  were  included  in  a closed 
analytical  system  (Table  5-3).  MTDFREML  tended  to  yield  slightly  dovraward  biases  but 
the  magnitudes  were  nearly  negligible.  For  a given  pair  of  environments,  the  large  and  small 
grouping,  however,  differed  in  the  estimates  of  type  B genetic  correlations  which  was 
indicated  by  the  imperfect  correlation  between  estimates  from  the  two  analytical  systems  for 
a given  pair  of  environments  (data  not  shown).  This  implied  that,  with  multivariate  methods, 
data  within  an  analytical  system  are  inter-communicated  among  environments  so  that 
information  from  a third  environment  can  influence  the  estimates  of  type  B genetic 
correlation  between  the  other  two  environments. 

Comparison  of  the  best  results  from  univariate  methods  with  those  of  multivariate 
methods  in  terms  of  bias  did  not  produce  obvious  differences.  The  univariate  methods,  such 
as  the  GCA-approach  and  Yamada  I,  produced  estimates  as  unbiased  as  those  from 
multivariate  methods  for  almost  all  data  structures  simulated  in  this  study.  Other  univariate 
methods,  such  as  Burdon  method  and  Yamada  II  did,  however,  yield  empirically  biased 
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estimates  of  type  B genetic  correlations  when  data  were  unbalanced  due  to  missing  values 
or  when  heterogeneity  of  variances  among  environments  was  large.  Such  results  were 
consistent  with  previous  theoretical  considerations  and  empirical  studies  about  these 
univariate  methods  (Fernando  et  al.  1984;  Ito  and  Yamada  1990;  Dutilleul  and  Carriere 
1998). 


Table  5-3.  Empirical  biases  of  type  B genetic  correlation  estimates  from  different  estimation  methods 
for  simulated  half-sib  data  tested  in  a randomized  complete  block  experimental  design  with  single- 
tree  plots. 


Mortality 

Estimation  Method 

Environment  Pairs 

1-2 

1-3 

1-4 

2-3 

2-4 

3-4 

0% 

MTDFREML  V 

-0.011 

-0.011 

-0.014 

0.015 

0.017 

0.019 

MTDFREML  2 

-0.003 

0.001 

0.015 

0.003 

0.009 

0.015 

ASREML  1 

0.005 

0.008 

0.067* 

0.008 

0.069* 

0.053* 

ASREML  2 

0.002 

0.009 

0.058 

0.036 

0.047 

0.044 

GCA-approach 

0.002 

0.008 

0.031 

0.000 

0.019 

0.009 

Yamada  I 

0.000 

0.008 

0.031 

0.006 

0.023 

0.017 

Yamada  11 

-0.018* 

-0.051* 

-0.148* 

-0.019 

-0.082* 

-0.034 

Burdon 

0.002 

0.008 

0.031 

0.000 

0.019 

0.009 

30% 

MTDFREML  1 

-0.023* 

-0.019 

-0.039 

0.017 

-0.004 

0.020 

MTDFREML  2 

-0.009 

-0.001 

-0.012 

0.003 

-0.036 

0.022 

ASREML  1 

0.006 

0.020 

0.110* 

0.022 

0.074* 

0.098* 

ASREML  2 

0.006 

0.021 

0.134* 

0.024 

0.085* 

0.104* 

GCA-approach 

0.004 

0.022 

0.049 

0.018 

0.047 

0.033 

Yamada  1 

-0.004 

0.015 

0.047 

0.022 

0.041 

0.038 

Yamada  11 

-0.027* 

-0.053* 

-0.153* 

-0.013 

-0.091* 

-0.031 

Burdon 

-0.232* 

-0.192* 

-0.130* 

-0.161* 

-0.077* 

-0.022 

True  genetic  parameters  for  different  environments  are  given  in  Table  5-1.  f MTDFREML  (or  ASREML)  1 
and  2 refer  to  the  4-environment  and  2-environment  grouping,  respectively.  ♦ Biases  are  significantly  different 
from  zero  at  the  probability  level  as0.05. 
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Precision 

The  mean  distances  (MD)  between  the  estimated  and  true  type  B genetic  correlations 
from  different  estimation  methods  showed  consistent  differences  among  the  estimation 
methods  (Table  5-4).  Constrained  estimation  methods  (i.e.,  MTDFREML  and  Yamada  II) 
had  smaller  MDs  than  those  unconstrained  methods  (i.e.,  ASREML,  Yamada  I,  Burdon  and 
the  GCA  approach).  For  the  constrained  multivariate  method  MTDFREML,  the  4- 
environment  grouping  consistently  had  smaller  MD  than  2-environment  grouping.  But  this 
was  not  always  true  for  unconstrained  multivariate  ASREML.  Among  all  estimation 
methods,  MTDFREML  with  a 4-environment  system  persistently  yielded  the  smallest  MD 
for  a given  pair  of  environments.  This  was  followed  by  the  MTDFREML  with  two- 
environment  system,  Yamada  II,  and  then  the  GCA  approach  and  ASREML.  The  difference 
between  constrained  methods  and  unconstrained  methods  in  MD  became  even  larger  when 
heritabilities  in  two  environments  were  lowered  (Table  5-4). 

Regardless  of  estimation  methods,  the  mean  distance  between  estimated  and  true  type 
B genetic  correlations  became  steadily  larger  when  heritabilities  in  two  environments  were 
lowered  (Table  5-4).  For  example,  for  all  methods  except  Burdon’s,  MDs  were  not  greater 
than  0. 1 between  environments  1 and  2,  which  had  heritabilities  of  0.4  and  0.3,  respectively. 
In  contrast,  MDs  were  greater  than  0.2  for  almost  all  methods  between  environments  3 and 
4,  which  had  true  heritabilities  of  only  0.2  and  0.1,  respectively. 
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Table  5-4.  Mean-distance  (MD)  between  estimates  of  type  B genetic  correlation  and  their  true  values 
for  different  estimation  methods  using  simulated  half-sib  families  tested  in  a randomized  complete 
block  experimental  design  with  single-tree  plots. 


Mortality 

Estimation 

Method 

Environment  Pairs 

1-2 

1-3 

1-4 

2-3 

2-4 

3-4 

0% 

MTDFREML  1 

0.060 

0.092 

0.137 

0.119 

0.150 

0.187 

MTDFREML  2 

0.067 

0.100 

0.166 

0.124 

0.188 

0.205 

ASREML  1 

0.076 

0.108 

0.217 

0.126 

0.226** 

0.232** 

ASREML  2 

0.080 

0.108 

0.209 

0.126 

0.205 

0.218 

GCA-approach 

0.077 

0.106 

0.192 

0.124 

0.197 

0.210 

Yamada  I 

0.071 

0.108 

0.193 

0.126 

0.193 

0.208 

Yamada  11 

0.069 

0.104 

0.176 

0.122 

0.166 

0.180 

Burdon 

0.077 

0.106 

0.192 

0.124 

0.197 

0.210 

30% 

MTDFREML  1 

0.069 

0.107 

0.176 

0.132 

0.184 

0.230 

MTDFREML  2 

0.081 

0.125 

0.212 

0.124 

0.200 

0.293** 

ASREML  1 

0.099 

0.146 

0.331*^ 

0.157 

0.283** 

0.306** 

ASREML  2 

0.096 

0.148 

0.350** 

0.157 

0.282** 

0.316** 

GCA-approach 

0.100 

0.149 

0.316** 

0.156 

0.300** 

0.332** 

Yamada  I 

0.087 

0.142 

0.304** 

0.155 

0.279** 

0.303** 

Yamada  II 

0.084 

0.129 

0.218 

0.144 

0.207 

0.239 

Burdon 

0.245*^ 

0.255** 

0.305** 

0.239** 

0.313** 

0.293** 

True  genetic  parameters  for  different  environments  are  given  in  Table  5-1 . Out-of-bound  estimates  were  accepted  with  their 
original  values,  t MTDFREML  (or  ASREML)  1 and  2 refer  to  the  4-environment  and  2-environment  grouping, 
respectively.  **  indicates  that  MD  from  an  estimation  method  is  significantly  greater  than  MD  calculated  from 
the  commonly  used  method  of  Yamada  II. 


Correlations  Between  Estimated  and  True  Type  B Genetic  Correlations 

Pearson  correlation  coefficients  (calculated  based  on  300  randomly  simulated  data 
samples)  between  the  estimated  and  true  type  B genetic  correlations  for  a given  pair  of 
environments  were  generally  low  for  all  estimation  methods  (Table  5-5).  Considerable 
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differences  existed,  however,  among  estimation  methods.  For  the  constrained  multivariate 
methods,  the  grouping  containing  only  two  environments  yielded  higher  Pearson  correlation 
coefficient  between  the  true  and  estimated  type  B genetic  correlations  than  the  grouping  that 
included  all  4 environment  (Table  5-5).  Among  all  estimation  methods,  the  correlation 
coefficients  were  highest  for  the  univariate  GCA  approach,  which  was  followed  by  Yamada 


Table  5-5.  Pearson  correlation  coefficients  between  estimates  of  type  B genetic  correlations  and  their 
underlying  true  values  for  different  estimation  methods  using  simulated  half-sib  families  tested  in 
a randomized  complete  block  experimental  design  with  single-tree  plots. 


Mortality 

Estimation  Method 

Environment  Pairs 

1-2 

1-3 

1-4 

2-3 

2-4 

3-4 

0% 

MTDFREML  1 

0.264 

0.278 

0.279 

0.228 

0.399 

0.207 

MTDFREML  2 

0.314 

0.336 

0.319 

0.336 

0.424 

0.361 

ASREML  1 

0.282 

0.292 

0.254 

0.273 

0.358 

0.272 

ASREML  2 

0.252 

0.321 

0.319 

0.193 

0.236 

0.271 

GCA-approach 

0.397 

0.405 

0.381 

0.411 

0.454 

0.363 

Yamada  I 

0.311 

0.302 

0.354 

0.354 

0.407 

0.347 

Yamada  11 

0.313 

0.296 

0.360 

0.360 

0.424 

0.365 

Burdon 

0.397 

0.405 

0.381 

0.411 

0.454 

0.363 

30% 

MTDFREML  1 

0.128 

0.236 

0.162 

0.219 

0.233 

0.075 

MTDFREML  2 

0.204 

0.290 

0.242 

0.367 

0.269 

0.203 

ASREML  1 

0.160 

0.234 

0.082 

0.262 

0.256 

0.165 

ASREML  2 

0.181 

0.260 

0.100 

0.257 

0.214 

0.141 

GCA-approach 

0.298 

0.329 

0.336 

0.339 

0.360 

0.235 

Yamada  1 

0.205 

0.254 

0.192 

0.280 

0.289 

0.197 

Yamada  II 

0.226 

0.268 

0.270 

0.303 

0.332 

0.247 

Burdon 

0.237 

0.181 

0.162 

0.197 

0.046 

0.067 

True  genetic  parameters  for  different  environments  are  given  in  Table  5- 1 . Out-of-bound  estimates  were  accepted  with  their 
original  values,  t MTDFREML  (or  ASREML)  I and  2 refer  to  the  4-environment  and  2-environment  grouping, 
respectively. 
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II  and  then  multivariate  method  MTDFREML  with  2-environment  grouping.  Burdon’s 
method  was  again  equal  to  the  GCA  approach  when  data  were  balanced  within  an 
environment,  but  inferior  to  the  GCA-approach  when  data  were  unbalanced  within  an 
environment.  Results  forYamada  I and  ASREML  were  between  those  of  the  GCA  approach 
and  Burdon’s  method. 

Distribution  of  Estimates 

For  a given  true  underlying  type  B genetic  correlation,  various  estimates  were 
obtained  from  random  data  samples  due  to  sampling  errors.  The  scatter  plots  of  estimated 
type  B genetic  correlations  against  their  true  values  were  affected  by  both  the  estimation 
methods  and  the  true  genetic  parameters.  Multivariate  method  MTDFREML  and  univariate 
method  of  Yamada  II  (Eq.5-5)  constrained  estimates  of  type  B genetic  correlations  not 
greater  than  1 and,  consequently,  skewed  the  distribution  of  estimates  when  true  type  B 
genetic  correlation  was  close  to  1 (Figures  5-1  & 5-2).  Multivariate  method  ASREML  and 
all  other  univariate  methods,  on  the  other  hand,  allowed  for  out-of-bound  estimates  which 
made  the  distribution  of  estimates  appear  nearly  symmetric  against  the  true  values.  For 
unconstrained  univariate  and  multivariate  methods,  however,  some  very  large  estimates  were 
also  produced.  The  frequency  of  out-of-bound  estimates  steadily  increased  as  heritabilities 
in  one  or  two  of  the  environments  became  smaller  (Figure  4-1).  For  example,  between 
environments  2 and  3,  which  had  heritability  0.3  and  0.2,  respectively,  there  were  only  21 
(out  of  300)  estimates  greater  than  1 .0.  In  contrast,  between  environments  1 and  4,which  had 
heritabilities  of  0.4  and  0.1,  respectively,  more  than  60  (out  of  300)  estimates  were  greater 
than  1 .0,  although  the  designed  true  parameters  for  both  environment  pairs  were  0.7. 
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a.  Multivariate  MTDFREML  b.  Multivariate  ASFREML 


c.  Univariate  Yamada  II 


d.  Univariate  GCA  approach 


Figure  5-1.  Scatter  plots  of  estimates  of  type  B genetic  correlations  from  multivariate  and 
univariate  methods  against  true  type  B genetic  correlations  for  300  random  samples.  True 
parameters  are:  h,^=0.4,  h2^=0.3,  and  rB=0.9.  Experimental  designs  in  both  environments 
are  randomized  complete  block  with  one  tree  per  family  per  block.  In  each  environment, 
there  are  20  blocks  and  90  half-sib  families. 
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a.  Multivariate  MTDFREML 
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True  type  B genetic  correlations 


b.  Multivariate  ASREML 


c.  Univariate  Yamada  II 


d.  Univariate  GCA  approach 


Figure  5-2.  Scatter  plots  of  estimates  of  type  B genetic  correlations  from  multivariate 
and  univariate  methods  against  the  true  type  B genetic  correlations  for  300  random 
samples.  True  parameters  are:  h,^=0.2,  h2^=0.1,  and  re=0.5.  Experimental  designs  in 
both  environments  are  randomized  complete  block  with  one  tree  per  family  per  block. 
In  each  environment,  there  are  20  blocks  and  90  half-sib  families. 
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Discussion 

Results  of  simulations  in  this  study  demonstrated  that  estimates  of  type  B genetic 
correlations  using  multivariate  methods  were  empirically  as  unbiased  as  the  best  results  from 
univariate  methods  (Table  5-3)  for  both  balanced  and  balanced  data  with  heterogeneous 
genetic  and  error  variances.  Although  a tendency  of  slightly  downward  bias  was  detected  for 
the  constrained  multivariate  method  MTDFREML,  the  magnitudes  of  such  biases  were  very 
small,  hence  negligible.  This  tendency  of  downward  bias  was  possibly  caused  by 
constraining  estimates  within  theoretical  parameter  space,  some  of  which  may  otherwise  be 
out-of-bounds.  The  small  magnitudes  of  such  downward  biases  were  probably  due  to:  (1)  the 
relatively  small  proportion  of  estimates  of  type  B genetic  correlations  which  would  be  out- 
of-bounds  between  environments  having  higher  heritabilities  and,  consequently,  causing 
little  changes  to  the  mean  of  estimates  when  they  were  constrained  within  parameter  space; 
and  (2)  potential  compensation  by  upward  biases  of  estimates  of  type  B genetic  correlations 
for  pairs  of  environments  having  low  heritabilities.  For  traits  of  low  heritability,  there  is  a 
high  probability  that  the  estimates  of  genetic  variance  would  be  zero  or  negative  (Hill  and 
Thompson  1978).  The  practice  of  setting  estimates  of  type  B genetic  correlations  to  zero  for 
data  samples  with  zero  or  negative  estimates  of  genetic  variances  are  likely  to  yield  upwardly 
biased  estimates  of  type  B genetic  correlation  in  prolonged  use. 

The  empirically  unbiased  estimates  of  type  B genetic  correlations  from  the  univariate 
methods  of  Yamada  I and  II  were  likely  due  to  the  specific  data  structures  simulated  in  this 
study.  Previous  numerical  studies  (Fernando  et  al.  1984;  Dutilleul  and  Carriere  1998; 
Chapter  4 of  this  dissertation)  indicated  that  estimates  of  type  B genetic  correlations  from 
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Yamada  s methods  were  subject  to  bias  when  heterogeneity  of  variances  was  severe  and  data 
were  highly  unbalanced  among  environments  in  terms  of  their  relative  sizes.  In  this 
simulation  study,  data  samples  had  about  the  same  sizes  across  all  environments,  although 
the  genetic  and  environmental  variances  were  heterogeneous.  As  a result,  biases  from 
Yamada’s  methods  were  less  severe  or  negligible. 

Because  each  estimation  method  was  used  to  analyze  the  same  data  sets,  differences 
among  estimation  methods  in  MD  reflected  their  differential  estimation  precisions.  The 
smaller  MDs  (Table  5-4)  obtained  from  constrained  estimation  methods  (MTDFREML  and 
Yamada  II)  than  from  unconstrained  methods  (ASREML,  Yamada  I,  Burdon,  and  the  GCA 
approach)  were  expected  because,  by  theory,  the  true  values  of  type  B genetic  correlations 
caimot  be  located  outside  the  parameter  space.  Therefore,  higher  estimation  precision  can 
simply  be  achieved  by  restraining  estimates  of  genetic  correlations  from  being  out  of  bounds, 
which  consequently  narrows  the  confidence  interval  of  estimates  even  though  such 
confidence  intervals  may  not  be  symmetric. 

The  smaller  MD  for  a multivariate  method  MTDFREML  which  used  data  from  all 
four  environments  to  estimate  the  pairwise  type  B genetic  correlation  compared  to  the  two- 
environment  grouping  may  possibly  be  due  to  the  more  stable  estimates  of  genetic  variances 
from  the  four-environment  grouping  than  from  the  two-environment  grouping.  Evidence 
supporting  this  reasoning  is  that  the  standard  deviation  of  estimates  of  genetic  variance 
among  the  300  random  data  samples  for  a given  environment  was  slightly  smaller  for  the 
four-environment  grouping  than  for  the  two-environment  grouping.  This  may  suggest  that, 
for  multivariate  methods,  additional  information  in  a larger  grouping  system  helped  improve 
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the  quality  of  estimates  of  genetic  variance  components  so  that  the  sampling  error  for  type 
B genetic  correlation  was  smaller. 

The  magnitudes  of  MDs  were  surprisingly  large  for  estimates  of  type  B genetic 
correlations  between  pairs  of  environments  with  traits  of  low  heritabilities.  In  the  simulation, 
the  MDs  increased  substantially  from  balanced  data  with  20  single-tree  blocks  to  unbalanced 
data  with  an  average  of  14  single-tree  blocks.  With  such  large  sampling  errors,  the  biases  of 
the  estimates  may  become  less  meaningful.  To  have  reliable  estimates  of  type  B genetic 
correlations  a relatively  large  number  of  replications  and  families  in  the  field  experimental 
designs  may  be  required. 

Besides  the  properties  of  unbiasedness  and  precision,  correlation  between  estimated 
and  true  type  B genetic  correlation  could  be  important  because  it  reflects  the  response  of 
estimates  to  changes  on  the  underlying  true  values.  In  this  study,  the  generally  low 
correlation  between  the  estimated  and  true  type  B genetic  correlation  was  attributable  to  the 
large  sampling  errors  of  type  B genetic  correlations  as  indicated  by  the  range  of  estimates 
for  a given  true  value  (Figure  5-2).  For  the  constrained  multivariate  method,  a smaller  system 
containing  only  two  environments  seemed  to  be  more  desirable  in  type  B genetic  correlation 
estimation  than  a larger  system  containing  more  environments.  Interferences  by  information 
from  unconcerned  environments  may  have  reduced  the  correspondence  of  estimates  of  type 
B genetic  correlation  to  their  true  values.  This  trend  was  clearly  demonstrated  by 
MTDFREML  with  the  higher  correlation  coefficients  from  the  two-environment  system  than 
from  the  four-environment  system  for  various  genetic  backgrounds.  The  two-environment 
system  had  yielded  larger  MDs,  however,  as  compared  with  the  four-environment  system. 
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The  reason  for  such  results  was  not  clear,  probably  due  to  differences  between  the  two 
grouping  systems  in  variance  component  estimation. 

While  constraining  estimates  of  type  B genetic  correlation  within  the  theoretical 
parameter  space  have  increased  the  overall  estimation  precision,  it  also  skewed  the 
distribution  of  estimates  and  caused  downward  biases.  For  example,  scatter  plots  of  estimates 
of  type  B genetic  correlations  against  their  true  values  indicated  that  for  the  unconstrained 
methods,  multivariate  ASREML  and  univariate  GCA  approach,  the  estimates  were  nearly 
symmetrically  distributed  for  a given  true  value  (Figure  5- lb  and  5- Id).  In  contrast,  for 
constrained  multivariate  method  MTDFREML  and  univariate  method  Yamada  II,  the 
distribution  of  estimates  (Figure  5- 1 a and  5-Ic)  was  not  symmetric  against  the  true  values 
due  to  the  limitation  of  theoretical  boundary,  which  had  definitely  altered  the  distributional 
pattern  of  estimates. 

The  choice  of  constrained  or  unconstrained  estimates  of  type  B genetic  correlations 
may  be  objective  specific.  For  example,  in  theoretical  studies  of  the  distribution  and 
sampling  errors  of  type  B genetic  correlation,  unconstrained  estimates  may  be  more  desirable 
to  show  the  original  distributional  pattern  of  estimates  so  that  the  potential  confidence 
intervals  of  estimates  can  be  investigated  in  an  unbiased  manner.  In  practical  genetic  data 
analysis,  however,  constrained  estimates  of  type  B genetic  correlations  may  be  easier  to 
interpret  and  more  reasonable  to  apply  when  they  are  involved  in  indirect  selections.  For  a 
given  set  of  values  of  heritabilities,  phenotypic  variance  and  selection  intensities,  genetic 
response  from  indirect  selection  is  theoretically  less  or  equal  to  the  gain  from  direct  selection 
(Falconer  1989).  If  estimates  of  genetic  correlations  greater  than  1 were  used,  however,  the 
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above  theoretical  rule  would  be  violated,  yielding  more  predicted  gains  from  indirect 
selection  than  from  direct  selection. 

Although  univariate  methods,  such  as  the  GCA  approach,  can  achieve  estimates  of 
type  B genetic  correlations  as  unbiased  as  those  from  unconstrained  multivariate  methods 
for  balanced  and  unbalanced  data,  it  may  be  practically  more  convenient  and  efficient  to  use 
multivariate  methods  if  such  computer  softwares  are  available.  In  addition  to  the  property 
of  constrained  estimates  with  empirical  unbiasedness  and  higher  precision,  multivariate 
methods  offer  additional  advantages  for  data  having  multiple  generations  or  genetic 
relatedness  that  cannot  currently  be  accounted  for  by  univariate  methods.  These  benefits 
would  include:  (1)  multivariate  methods  can  make  use  of  pedigree  information  so  that  type 
B genetic  correlations  can  be  estimated  between  two  environments  that  have  only  indirect 
genetic  connectedness,  (2)  genetic  relatedness  among  genetic  groups  within  and  between 
environments  can  be  accounted  for  so  that  the  assumption  of  independence  among  genetic 
groups  with  univariate  methods  can  be  relaxed,  and  (3)  there  is  greater  flexibility  to  include 
data  from  mixtures  of  mating  designs  and  different  generations.  As  tree  improvement 
programs  progress  into  advanced  generations  and  data  structures  become  more  complicated 
(White  1996),  multivariate  methods  could  be  more  appropriate  in  estimating  type  B genetic 
correlations. 

Conclusion 

Although  some  univariate  methods  can  yield  unbiased  estimates  of  type  B genetic 
correlation  for  unbalanced  data  with  heterogeneous  variances,  advantages  associated  with 
multivariate  methods  make  them  a viable  option  in  the  estimation  of  type  B genetic 
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correlations.  Estimates  of  type  B genetic  correlations  from  multivariate  methods  are 
empirically  unbiased  for  unbalanced  data  with  heterogeneous  variances.  Constraining 
estimates  within  theoretical  parameter  space  improves  estimation  precision  and  practical 
application.  Using  more  environments  in  an  analytical  system  with  multivariate  methods  not 
only  increases  computational  efficiency,  but  may  also  enhance  the  quality  of  estimates  of 
genetic  variances,  resulting  in  smaller  sampling  errors  of  the  estimates  of  type  B genetic 
correlations. 


CHAPTER  6 
CONCLUSIONS 


For  tree  improvement  programs,  proper  and  effective  analyses  of  data  from  genetic 
tests  are  crucial  for  obtaining  accurate  information  about  genetic  parameters  and  for 
achieving  maximum  genetic  gains  by  correctly  predicting  breeding  values,  ranking 
candidates  and  making  selections.  The  application  of  mixed  model  methods  in  forest  genetic 
data  analysis  greatly  enhances  the  effectiveness  of  handling  unbalanced  data  and  improves 
the  properties  of  predicted  breeding  values.  The  use  of  incomplete  mixed  linear  models  with 
respect  to  forest  genetic  experiment  helps  reduce  computational  demands  for  large  data  sets, 
but  may  adversely  affect  the  quality  of  data  analysis  by  producing  considerable  biases.  The 
relative  consequences  of  biases  produced  by  incomplete  mixed  linear  models  are  especially 
severe  for  traits  under  weak  additive  genetic  control  but  with  strong  influence  of  dominance 
and  genotype  x environment  (G  x E)  interactions. 

For  mixed  linear  models  excluding  genotype  x environment  interaction,  the 
magnitude  of  biases  for  estimated  heritabilities  and  predicted  breeding  values  are  linearly 
related  to  the  number  of  testing  environments  in  which  progeny  tests  are  arranged.  The  fewer 
the  number  of  environments  involved,  the  larger  the  potential  biases.  Single-site  data  analysis 
represents  the  extreme  example  in  which  the  estimates  of  additive  genetic  variance 
components  are  upwardly  biased  by  the  entire  G x E interaction  variance  components. 
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If  dominance  effects  are  excluded  by  mixed  linear  models  in  the  analysis  of  fiill-sib 
data,  biases  to  the  estimates  of  heritability  and  predicted  genetic  gains  are  linearly 
proportional  to  the  numbers  of  crosses  in  which  each  parent  is  involved.  The  fewer  the 
number  of  crosses  per  parent,  the  larger  the  potential  biases.  The  commonly  used  circular 
mating  design  generally  has  fewer  crosses  per  parent  than  other  mating  designs  such  as  the 
half-diallel,  therefore,  potentially  larger  biases  would  be  expected  for  circular  mating  design 
if  dominance  effects  are  strong  and  not  included  in  analytical  mixed  models. 

The  purely  additive  genetic  model,  which  is  often  used  in  animal  genetic  data 
analysis,  ignores  both  dominance  effects  and  G x E interactions  and  is  often  unsuitable  in 
forest  genetic  data  analysis  for  several  reasons;  (1)  economically  important  traits  of  most 
timber  species  are  usually  associated  with  low  heritability;  (2)  dominance  effects  and  G x E 
interactions  are  reasonably  strong  for  many  important  traits  of  forest  species;  and  (3)  the 
purely  additive  model  accumulates  biases  from  ignoring  both  G x E interaction  and 
dominance. 

In  addition  to  biases  in  estimating  genetic  parameters,  incomplete  analytical  mixed 
linear  models  generally  give  false  information  about  the  quality  of  breeding  value  and 
genetic  gain  prediction.  By  upwardly  biasing  the  estimates  of  additive  genetic  variance 
components,  most  incomplete  mixed  models  tend  to  over  predict  genetic  responses  from 
selection  and  overstate  the  reliability  of  predicted  breeding  values.  Therefore,  interpretation 
of  results  from  incomplete  mixed  linear  models  must  be  conducted  with  caution, 
understanding  that  upward  biases  may  exist. 

In  circumstances  where  incomplete  mixed  linear  models  need  to  be  used  to  facilitate 
data  analyses,  the  best  choice  would  be  to  ignore  the  dominance  x environment  interaction. 
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Potential  biases  from  this  incomplete  mixed  model  are  generally  downward  and  several-fold 
smaller  than  the  biases  from  incomplete  models  ignoring  other  non-additive  genetic  effects. 
At  the  same  time,  the  computational  demands  are  greatly  reduced  due  to  the  large  number 
of  degrees  of  freedom  associated  with  the  dominance  x environment  interaction. 

Despite  the  potential  biases  of  incomplete  linear  models  in  heritability  estimation  and 
breeding  value  prediction,  the  use  of  incomplete  mixed  models  does  not  affect  actual  genetic 
gains  from  selection.  This  is  because  the  incomplete  models  yield  similar  rankings  of 
candidates  to  the  fiill  model,  and,  therefore,  the  selected  candidates  are  largely  the  same 
candidates  whether  from  full  or  incomplete  models. 

The  statistical  relationship  between  an  incomplete  and  a full  linear  model  based  on 
balanced  data  provides  useful  tools  to  investigate  the  potential  biases  from  incomplete  linear 
models.  Closed  forms  can  be  derived  to  estimate  the  biases  from  a specific  incomplete  linear 
models  for  a given  genetic  background  and  field  experimental  design.  Such  formulae  can  be 
extended  to  estimate  biases  for  incomplete  models  when  data  are  moderately  unbalanced  by 
using  the  average  design  parameters  of  an  experiment. 

Type  B genetic  correlations  are  frequently  estimated  in  forest  genetic  data  analyses 
for  purposes  of  studying  G x E interactions  and  making  indirect  selections.  Commonly  used 
univariate  methods  are  derived  based  on  balanced  data  and  often  become  sub-optimal  when 
data  are  unbalanced  and  with  heterogeneous  variances.  A new  univariate  approach  is 
proposed  in  this  study  which  is  based  on  predicted  parental  GCA  effects  using  univariate 
BLUP  program  in  individual  environments.  Both  theoretical  considerations  and  numerical 
comparison  have  shown  this  is  a viable  choice  to  estimate  type  B genetic  correlation  when 
data  are  highly  unbalanced  or/and  have  heterogenous  variances  among  environments. 
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Empirical  properties  of  estimates  from  this  new  GCA  approach  include  unbiasedness, 
relatively  high  precision  and  a better  relationship  with  the  true  underlying  type  B genetic 
correlations. 

While  univariate  estimation  methods  of  type  B genetic  correlations  are  exclusively 
used  in  forest  genetic  data  analyses  due  to  their  small  computational  demand  and  availability 
of  computer  software,  multivariate  methods  were  shown  in  this  study  to  yield  more  desirable 
estimates.  By  treating  observations  in  different  environments  as  different  traits  and 
incorporating  them  into  a closed  analytical  system,  multivariate  methods  estimate  genetic 
variance  and  covariance  components  simultaneously  using  the  REML  approach. 
Constrained  estimates  of  type  B genetic  correlations  from  multivariate  methods  are  all  within 
the  theoretical  parameter  space  and  empirically  unbiased.  By  eliminating  out-of-bound 
estimates  of  type  B genetic  correlation,  the  average  precision  and  accuracy  of  estimates  can 
be  considerably  improved  due  to  the  fact  that  true  type  B genetic  correlation  cannot  be  out 
of  parameter  space.  The  higher  estimation  precision  of  the  constrained  multivariate  method 
manifests  itself  more  strikingly  when  heritabilities  of  a trait  under  investigation  are  low  in 
some  environments.  Practically,  constrained  estimates  of  type  B genetic  correlations  are 
more  reasonable  for  interpretation  and  more  appropriate  for  use  in  estimating  genetic  gains 
from  indirect  selections. 

Practical  application  of  multivariate  methods  in  type  B genetic  correlation  estimation 
should  become  popular  when  use-friendly  computer  softwares  are  widely  available  and 
genetic  testing  data  have  complex  relatedness  structures.  Proper  data  structures  and  correct 
determination  of  analytical  mixed  linear  models  are  vital  to  use  a multivariate  program. 
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EXPECTED  SUM  OF  SQUARES  FOR  THE  EFFECT  OF  FAMILY  X BLOCK 
(ENVIRONMENT)  INTERACTION  IN  THE  INCOMPLETE  LINEAR  MODEL  IGNORING 
THE  EFFECT  OF  FAMILY  X ENVIRONMENT  INTERACTION 
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APPENDIX  2-  2 

EXPECTED  SUM  OF  SQUARES  FOR  THE  FAMILY  EFFECT  IN  THE 
INCOMPLETE  LINEAR  MODEL  IGNORING  THE  EFFECT  OF  FAMILY  X 
ENVIRONMENT  INTERACTION 


From  Eq.2-6  page  13,  we  have 
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APPENDIX  4-1 

EXPECTED  VALUES  OF  COVARIANCE  AND  VARIANCES  OF  ADJUSTED 

PARENTAL  GCA  EFFECTS 


From  Chapter  4 page  69,  it  is  defined  that  predicted  parental  GCA  effects  are  g^.  andg^.,  and 
their  true  value  be  g^.  andg^.,  respectively,  in  environments  x and  y.  Further,  the  prediction 
accuracies  andg^.  are  r^.  andr^.,  respectively,  in  two  environments,  then 
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